## Capstone Part 2

#### Idea selected:

Stratigraphic formation tops model to predict where desired tops appear on a type log. Input would be a type log with gamma ray curve and no tops, after model is run the type log would have desired formation tops labeled.

#### Problem Statement:

I want to build a multiclass classification model (Decision Tree, SVM, KNN) that will input a geologic type log and be able to pick the desired geologic formation tops and return the values to the user. And possibly warn the user if there is any section missing which we call a fault.

#### Data Source:

Different types of rock emit different amounts and different spectra of natural gamma radiation. In particular, shales usually emit more gamma rays than other sedimentary rocks, such as sandstone, gypsum, salt, coal, dolomite, or limestone this is because radioactive potassium is a common component in their clay content, and because the cation exchange capacity of clay causes them to absorb uranium and thorium. https://en.wikipedia.org/wiki/Gamma_ray_logging

The tool that measures gamma radiation is a common piece of the bottom hole assembly along with a motor and drilling bit.  As the rig drills through the rock, the gamma ray tool collects data from the rocks, which helps geologists identify their position.  The rig drills rock a couple of different ways.  The first is by using a mechanical drill bit with diamond teeth that turns and cuts through the rock, much like you use a hand drill to cut through wood.  The second way is like a power washer, there are holes in the drilling bit that shoot mud forward that actually does the majority of the drilling.  The drilling mud is one of the most important parts of the whole operation.  Like I said, it helps actually drill the hole, it also has numerous properties that are constantly checked.  These properties keep the mud a certain weight and viscosity which holds dangerous gas in the hole.  If the mud doesn't have the correct properties, the gas can literally flow out of the hole and explode or burn a drilling rig down within minutes.  The third reason for drilling mud is it carries the rocks that we are drilling to the surface and cleans the hole.  The geologist on site examines these rocks to also judge position along with oil content etc.  The fourth and most important reason for the mud these days is it helps the people who run the gamma ray tool to send signals to the surface.  There is a small piston in the tool that displaces the mud, sending morse code like signals to a computer on the surface that collects these signals and translates them for our normal work computers to interpret.  The second geologist looks at these graphs and compares them with other gamma ray data in the area to judge their position in the stratigraphic column.  The geologist on site and the one examining the gamma ray logs stay in constant communication, making sure they are drilling where they are supposed to for the most efficient and productive wells possible.

Image of a BHA:
https://www.google.com/url?sa=i&url=https%3A%2F%2Fkineticupstream.com%2Fproducts%2F&psig=AOvVaw2-bdKlduGcoz3SY2PmiXrp&ust=1608335810911000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCMDW__ib1u0CFQAAAAAdAAAAABAD![image.png](attachment:image.png)

And a image of a type log which contains the gamma ray data:
https://www.google.com/url?sa=i&url=http%3A%2F%2Fwww-odp.tamu.edu%2Fpublications%2F204_IR%2Fchap_03%2Fc3_f50.htm&psig=AOvVaw22NzoBzyjxl4LNba603OPF&ust=1608335785127000&source=images&cd=vfe&ved=0CAIQjRxqGAoTCJj00u6b1u0CFQAAAAAdAAAAABCQAQ![image-2.png](attachment:image-2.png)

#### EDA:

In [1]:
import pandas as pd
import numpy as np

In [2]:
type_log_tops = pd.read_csv('../data/type_log_tops.csv')

In [3]:
type_log = pd.read_csv('../data/type_log.csv')

In [4]:
type_log_tops.head()

Unnamed: 0,NAME,DEPTH
0,3 RD Bone Springs,8428
1,Wolfcamp A,9453
2,Wolfcamp B,9939
3,Target Top,9975
4,Target Center,9985


In [5]:
type_log.head()

Unnamed: 0,DEPT,NPHI,GR,TENS,GRTH,LLS,LLD,MSFL,CALI,CAL1,RHOB,DRHO,PE,DPHI,MDT,SPHI,ITTI,ITTT,XPHI
0,,,,,,,,,,,,,,,,,,,
1,0.3733,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
2,0.8733,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
3,1.3733,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
4,1.8733,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25


In [6]:
type_log.shape

(21050, 19)

In [8]:
type_log.dtypes

DEPT    float64
NPHI    float64
GR      float64
TENS    float64
GRTH    float64
LLS     float64
LLD     float64
MSFL    float64
CALI    float64
CAL1    float64
RHOB    float64
DRHO    float64
PE      float64
DPHI    float64
MDT     float64
SPHI    float64
ITTI    float64
ITTT    float64
XPHI    float64
dtype: object

In [9]:
type_log = type_log.dropna(how = 'all')

In [10]:
type_log.drop(type_log[type_log['GR'] < 0].index, inplace = True)

In [11]:
type_log.head()

Unnamed: 0,DEPT,NPHI,GR,TENS,GRTH,LLS,LLD,MSFL,CALI,CAL1,RHOB,DRHO,PE,DPHI,MDT,SPHI,ITTI,ITTT,XPHI
200,99.8733,0.453,40.1618,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
201,100.3733,0.5271,42.1417,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
202,100.8733,0.4311,38.182,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
203,101.3733,0.4813,35.9287,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25
204,101.8733,0.5825,36.8431,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25,-999.25


In [12]:
type_log.drop(columns=['NPHI', 'TENS', 'GRTH', 'LLS', 'LLD', 'MSFL', 'CALI', 'CAL1', 'RHOB', 'DRHO', 'PE', 'DPHI', 'MDT', 'SPHI', 'ITTI', 'ITTT', 'XPHI'], inplace=True)

In [13]:
type_log.head()

Unnamed: 0,DEPT,GR
200,99.8733,40.1618
201,100.3733,42.1417
202,100.8733,38.182
203,101.3733,35.9287
204,101.8733,36.8431


In [15]:
type_log.tail()

Unnamed: 0,DEPT,GR
20770,10384.8733,107.1848
20771,10385.3733,106.4086
20772,10385.8733,104.8953
20773,10386.3733,104.9321
20774,10386.8733,105.3663


In [16]:
type_log.shape

(20575, 2)