# Pre-process data

Data will often require some pre-processing for analysis. Here we calculate strike, append lithology data, and calculate depth in ft.

In [24]:
from fractoolbox import dip2strike
import pandas as pd

## 1. Calculate strike from dip azimuth

Exports from log analysis software typically do not contain strike, only dip and dip azimuth. 

The following code calculates strike from dip azimuth using the right hand rule.

In [25]:
picks = pd.read_csv('0_Synthetic_data.csv')
picks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 450 entries, 0 to 449
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   depth_m      450 non-null    float64
 1   dip_az       450 non-null    float64
 2   dip          450 non-null    float64
 3   aperture_mm  450 non-null    float64
 4   attribute    450 non-null    object 
dtypes: float64(4), object(1)
memory usage: 17.7+ KB


In [26]:
strike = []

for dipaz in picks.dip_az:
    strike.append(dip2strike(dipaz))

picks['strike'] = strike

## 2. Append lithology type to data

Contextual data can inform our analysis. We can append lithology (and other from - to data) as attributes to the picks.

In [27]:
# make synthetic data

data = {
    'from_m': [500, 750], 
    'to_m': [750, 1000], 
    'lithology': ['Rock A', 
                 'Rock B'],
}

mudlog = pd.DataFrame(data=data)
mudlog

Unnamed: 0,from_m,to_m,lithology
0,500,750,Rock A
1,750,1000,Rock B


The method provided here allows for repeated values in the lithology column by allocating a code to each depth interval. 

There are simpler approaches that can be used if the lithology column does not contain repeated values.

This method can be adapted for any from-to categorical data.

In [28]:
# number the mudlog DataFrame rows from 0 to n
mudlog['lith_num'] = range(len(mudlog))

# call the unit tops to a list
depth_bins = mudlog['from_m'].to_list()

# append the deepest value to the bins list
depth_bins.append(mudlog['to_m'].iloc[-1])

# make a list containing the unique codes
lith_num_label = mudlog['lith_num'].to_list()

# append lithology numbers to the picks DataFrame based on depth
picks['lithology'] = pd.cut(
    picks.depth_m,
    bins=depth_bins, 
    labels = lith_num_label
    )

# turn mudlog['lith_num'] and mudlog['lithology'] columns into a dictionary, with the lith_num as the key
lith_dict = dict(zip(mudlog['lith_num'], mudlog['lithology']))

# replace the lithology number with the lithology name using the dictionary above
picks.lithology.replace(lith_dict, inplace=True)

## 3. Calculate pick depth in ft

In [29]:
picks['depth_ft'] = round(picks.depth_m / 3.281, 2) # rounded to 2 decimal places

## 4. Export processed data for use elsewhere

In [30]:
picks.to_csv('1_Pre-processed data.csv', index=False)