Copyright (c) Microsoft Corporation.

Licensed under the MIT License.

# Converting SEG-Y files for training or validation

This notebook describes how to prepare your own SEG-Y files for training.

If you don’t have your owns SEG-Y file, you can run *01_segy_sample_files.jpynb* notebook for generating synthetics files.

To use your own SEG-Y volumes to train models in the DeepSeismic repo, you need to bring at least one pair of ground truth and label data SEG-Y files where the files have an identical shape. The seismic data file contains  typical SEG-Y post stack data traces and the label data file should contain an integer class label at every sample in each trace.

For each SEG-Y file, run the convert_segy.py script to create a npy file. Optionally, you can normalize and/or clip the data in the SEG-Y file as it is converted to npy.

Once you have a pair of ground truth and related label npy files, you can edit one of the training scripts in the repo to use these files. One example is the [dutchf3 train.py](../../experiments/interpretation/dutchf3_patch/local/train.py) script.


In [None]:
from itkwidgets import view
import numpy as np
import os

SEGYFILE= './normalsegy.segy'
PREFIX='normalsegy'
OUTPUTDIR='data'

## convert_segy.py usage

In [None]:
!python ./convert_segy.py --help

# Example run

Convert the SEG-Y file to a single output npy file in the local directory. Do not normalize or clip the data

In [None]:
!python ./convert_segy.py --prefix {PREFIX} --input_file {SEGYFILE} --output_dir {OUTPUTDIR} --clip

## Post processing instructions

There should now be on npy file in the local directory named donuthole_10_100_00000.npy. The number relate to the anchor point
of the array. In this case, inline 10, crossline 100, and depth 0 is the origin [0,0,0] of the array.

Rerun the convert_segy script for the related label file

In [None]:
npydata = np.load(f"./{OUTPUTDIR}/{PREFIX}_10_100_00000.npy")
view(npydata, slicing_planes=True)

### Prepare train/test splits file

Once the data and label segy files are converted to npy, use the `prepare_dutchf3.py` script on the resulting npy file to generate the list of patches as input to the train script.

In the next cell is a example of how to run this script. Note that we are using the same npy (normalsegy_10_100_00000.npy) file as seismic and labels because it is only for ilustration purposes.

Also, once you've prepared the data set, you'll find your files in the following directory tree:   

data_dir   
├── output_dir   
├── split    
│&emsp;   ├── section_train.txt   
│&emsp;   ├── section_train_val.txt   
│&emsp;   ├── section_val.txt 

In [None]:
!python ../../../scripts/prepare_dutchf3.py split_train_val section --data_dir={OUTPUTDIR} --label_file={PREFIX}_10_100_00000.npy --output_dir=splits --section_stride=2 --log_config=None --split_direction=both