# **Tutorial 2**

### **A Machine Learning Pipeline for Seismic**

In this second tutorial, we will learn how to create a quick example of seismic classification using a non-supervised machine learning algorithm. In other words, we will classify a cube using K-Means algorithm.

We need to create a similar pipeline as we did previously. This time, we can increase the size of our iline blocks to split up each block to some worker.

In [None]:
from dasf_seismic.datasets import F3

dataset = F3(chunks={"iline": 5})

The ExtractData operator is used to get the array data from the Dataset class

In [None]:
from dasf.transforms import ExtractData

extracted_data = ExtractData()

We are using the *F3 Block* and we would like to verify if this data has any hydrocarbonate structure. To do that we selected 4 attributes commonly used to highlight structures like that: Envelope, Sweetness and Apparent Polarity.

In [None]:
from dasf_seismic.attributes.complex_trace import Envelope, Sweetness, ApparentPolarity

envelope = Envelope()
sweetness = Sweetness()
polarity = ApparentPolarity()

Now, we need to combine that strcuture in a single data. So, we need to concatenate them into a single block. We can concatenate into a new array or into dataframe, it does not matter. But, we need to use a 2-D data (with N-features) because most of the machine learning algorithms use that input dimension.

In [None]:
from dasf.transforms import ArraysToDataFrame

arrays2df = ArraysToDataFrame()

It is recurring that we don't train our algorithm using the whole dataset. Now, let's use only 5% of the generate data to train K-Means. It is faster enough and it avoids a crash when we execute the persist method.

In [None]:
from dasf.transforms import SliceArrayByPercent

slicearr = SliceArrayByPercent(5.0)

The next step, we need to normalize the data to get a better result when we fit K-Means.

In [None]:
from dasf.transforms import Normalize

normalize = Normalize()

To avoid recalculation, we can keep data saved after attribute calculation. **It can reduces memory usage but it is dangerous when you are using a local Dask cluster.**

In [None]:
from dasf.transforms import PersistDaskData

persist = PersistDaskData()

In the next step, let's create our K-Means instance. It is important to understand here that K-Means is a parameter because it does not change its state.

In [None]:
from dasf.ml.cluster import KMeans

kmeans = KMeans(n_clusters=15, max_iter=50)

To finalize our data definitions, we want to plot the predicted data. So we need to reshape the output of the KMeans.fit_predict, as it returns a 1-D array, and then plot an inline.

In [None]:
from dasf.transforms.operations import Reshape
from dasf_seismic.visualization import Plot2DIline

il, xl, z = dataset.shape

reshape = Reshape(shape=(il*5//100, xl, z))
iline_index = 20

plot = Plot2DIline(name=("Plot F3 block iline=" + str(iline_index)), iline_index=iline_index, swapaxes=(0, 1), cmap="rainbow")


Now, let's create a local Dask cluster without using a GPU. If you have one available, you can just unset this parameter, but this is a tutorial and the purpose of this is show you how you can enable/disable features for instance.

In [None]:
from dasf.pipeline.executors import DaskPipelineExecutor

dask = DaskPipelineExecutor()

Now, it is time to build our pipeline. Remember that we have two parameters: F3 dataset and K-Means.

In [None]:
from dasf.pipeline import Pipeline

pipeline = Pipeline("F3 Block plot pipeline", executor=dask)

pipeline.add(extracted_data, X=dataset) \
        .add(slicearr, X=extracted_data) \
        .add(envelope, X=slicearr) \
        .add(sweetness, X=slicearr) \
        .add(polarity, X=slicearr) \
        .add(arrays2df, envelope=envelope, sweetness=sweetness, polarity=polarity) \
        .add(normalize, X=arrays2df) \
        .add(persist, X=normalize) \
        .add(kmeans.fit_predict, X=persist) \
        .add(reshape, X=kmeans.fit_predict) \
        .add(plot.plot, X=reshape) \
        .visualize()

Now, it is time to run it and plot.

In [None]:
%time pipeline.run()