# GPR Data Processing in GPRpy

# Introduction

This jupyter notebook will walk you through one common set of steps for processing GPR data.

In this exercise, we will carry out the following steps:
* Create an instance of GPRPy's gprpyProfile class. 
* Import data into that GPR Profile
* Explore our data to better understand it
* Carry out processing steps on that GPR profile, including:
    * Direct-Current (DC) Offset Correction
    * Setting a "Zero time"
    * Dewowing the dataset
    * Background Subtraction/Removing mean trace value
    * Setting a propogation velocity
    * Carrying out FK migration
    * Topographical correction
    * Hilbert Transform/Enveloping

More information about each of these steps is given in the exercise below

# Create an instance of GPRpy's gprpyProfile class

Different software packages will handle data analysis differently. In GPRPy, the recommended way of analyzing GPR data collected as a profile is as follows:
1. import the main `gprypy` module of the `gprpy` package 
    * We will also give it the alias `gp`
2. Create an empty `gprpyProfile()` class instance (usually this is "saved" to a variable)
3. Import data into the gprpyProfile variable
4. Carry out processing steps using methods of the `gprpyProfile()` class
    * Remember, methods are simply functions that are inherently part of a class instance. 
    * The methods are performed using a dot accessor. 
        * Methods in python will always have a set of parentheses ()
        * Methods may take parameters in those parantheses, or may not

Run the next cell to import the main gprpy module as gp, and create a gprpyProfile() class instance called "mygpr" (or rename it to whatever you want! (just be sure to update the variable name in the following cells if you do))

In [None]:
import gprpy.gprpy as gp
rawGPRData = gp.gprpyProfile()

# Data Import

Now we will import our data. First, we will set a variable equal to the filepath of the main data file (the .dt1 file).

The way data is organized from this equipment manufacturer (Sensors and Software) is that data is saved as follows (when collecting as a profile):
* .DT1 file: all of the raw data traces (1D data, or A-Scan data) are saved together as a profile (2D data, or B-scan) in a compressed binary file
* .hd file: the metadata for each profile (number of traces, length of profile, center frequency of antenna, etc.) is saved in an ASCII-type file (i.e., can be read by humans using a notepad-type software)
* .gps file (optional): the GPS data is saved in an ASCII-type file as a series of GPGGA "sentences". See [here](https://docs.novatel.com/OEM7/Content/Logs/GPGGA.htm) for more information on the GPGGA format.

GPRpy will search for the .hd file in the same directory as the .dt1 file that you use, so you need to make sure these files stay together. 
GPRpy does not support reading the .gps file, but most commercial software will.

Many equipment manufacturers also develop their own processing software, but third party processing software also exists. 

Examples of commercial software:
* EkkoProject: this is the Sensors and Software in-house GPR analysis software (other manafacturers have their own software)
* GPRSlice: a flexible software with very many features, but which has a steep learning curve, often used in archaelogy
* ReflexW: a software originally intended for seismic processing, but which can be used for GPR if the GPR data can be converted to the correct format

Examples of open-source/free software:
* GPRpy: GPR processing software written in python that has a relatively easy-to-use user interface and has most (but not all) of the common GPR processing functionality and which supports several different types of GPR surveys
* RGPR: GPR processing software written in R that is fairly comprehensive and rigorous with respect to processing algorithms and visualization
* Geolitix: cloud-based processing (free for small projects with account)


In [None]:
# Update the dt1_filepath variable if you are not using Github Workspaces
dt1_filepath = r'/workspaces/GEOL451/GPR/GPRSampleData/LINE0.DT1'
rawGPRData.importdata(dt1_filepath)

# Set up plots for better viewing by setting the plots to be 20 "inches" wide and 4 "inches" high
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (20, 4)

# You will see this line in every code cell. It will plot what the data looks like after each step of the process
rawGPRData.showProfile()

# Data Exploration

First we will explore our dataset to better understand what we can do with it.

First, let's see what kind of properties and methods our `gprpyProfile()` class contains using the directory (`dir()`) command.

In [None]:
dir(rawGPRData)

Running the `dir()` command gives us a list of properties and methods contained within the mygpr instance of the `gprpyProfile()` class.

Items with underscores at the start and end are considered "private" or "background" properties or methods. We will not worry about these for now.

Some of the items listed here are methods, and some are properties. You can often access documentation or source code to figure out which are which.

For now, I will tell you that the `data` item is a property of mygpr and not a method. So, let's find the datatype of the `data` property. 

The `data` property can be accessed using a dot accessor, and we will use the `type()` command to determine its datatype:

In [None]:
type(rawGPRData.data)

This should be a [numpy matrix](https://numpy.org/doc/stable/reference/generated/numpy.matrix.html). 

[Numpy](https://numpy.org/) is by far the most commonly used package in python for holding data. The matrix class is a specialized version of the more common [numpy array](https://numpy.org/doc/stable/reference/arrays.ndarray.html) that retains its 2D nature through operations (the more common numpy array can be reshaped and resized by doing various operations). 

Importantly, many of the operations that work on numpy arrays will also work on numpy matrix objects.

For example, let's find out the size and shape of our dataset (`shape` is a property of all numpy arrays):

In [None]:
rawGPRData.data.shape

This should print up a result like:

`(232, 4684)`

This means that there are 4684 traces in our profile, each with 232 samples.

In gprpy, this metadata is also saved in the `info` property:

# Q1: Using the `info` property of the rawGPRData object, what is the value of the `'TZ_at_pt'` entry?

In [None]:
# Q1: info property of rawGPRData
# Access the info property of your mypgr in this code cell


While working with our data, we may want to extract individual traces from the larger GPR Profile for visualization purposes. For example, if we want to visualize just the 100th trace in the profile, we could use the following code:

In [None]:
# We will use an index value of 99 for the 100th trace since the first trace would be at index 0
# The colon : indicates all values (or if it appears next to a number like :30 it means all values until the 30th or 30: all values after the 30th)
trace100 = rawGPRData.data[:, 99]
plt.plot(trace100)
plt.axhline(y=0, linestyle='dotted', linewidth=1, c='k')
plt.show()

This will give us a plot of the 100th trace (oriented horizontally). The x-axis is the sample number, the y-axis is the data value.

We can expand on this code to create a "wiggle plot" of all (or a part) of our data if we want, just for fun! Let's plot the first 100 traces in this way!

In [None]:
# Setup figure and import numpy for data manipulation
fig, ax = plt.subplots() # Create a matplotlib plot object
import numpy as np # import the numpy library

# Get the samples as an array
samples = np.arange(rawGPRData.data.shape[0])

# Set the traces to plot
start_trace = 0 # First trace to plot
end_trace = 100 # Last trace to plot

# Create a range based on the specified start and end trace
traces_to_plot = np.arange(start_trace, end_trace)
num_traces = len(traces_to_plot) # Calculate the number of traces in this range

# Set the trace spacing (as defined in the .hd file)
traceStep = 0.025 # Trace spacing of 2.5 cm or 0.025 m

# Loop through all traces and plot them on the same plot
for traceNo in traces_to_plot:
    current_trace = rawGPRData.data[:,traceNo] # Extract each trace from larger dataset
    normalized_trace = (current_trace / np.max(current_trace))*traceStep/2 # Normalize the values so they fit on the chart
    x0_value = traceNo * traceStep # Set the "center line" of each trace
    trace_to_plot = x0_value + normalized_trace # Add the trace data to plot around that centerline
    ax.plot(trace_to_plot, samples, c='k', linewidth=0.5) # Plot the trace
ax.set_ylim([232,0]) # Make sure 0m is up
ax.set_xlim([0,traceStep*num_traces]) # Trim plot to just our traces
plt.show() # Show plot

We can see from the plot above that there is a lot of "dead time" before the 40th sample or so before we get our first large signal return. (We do not see much data underneath that at the moment, but we will pull that out as we process the data).

We can use the data before the first large signal return to help "align" our data in a technique called the Direct-Current Offset Correction.

# Direct-Current Offset (DC Offset)

DC Offset is not supported by GPRpy natively, but we can "macgyver" the dataset to do this manually.

In our previous plot, we can see that the first 40-50 samples of all of our traces are dead time. This should align well with your answer to question 1.

Let's be conservative, and just use the first 30 samples. We will use the 100th trace again for our initial analysis.

Let's calculate the median of the first 30 samples of the trace. 

Since there is no data being received yet during these first 30 samples, we might expect the median value of the first 30 traces to be 0.

**NOTE**: to perform a median operation, we will have to reduce the matrix to a 1D array (even though it wants to stay in 2 dimensions). To do this, we will use the `np.array()` command.

In [None]:
print(np.nanmedian(np.array(trace100[:30])))

This should give you a value of about -202.5 as the median of the first 30 samples. This is not 0! And this is a result of a phenonmenon called "DC drift." This means that our GPR waveform is not centered on 0. 

We will attempt to shift the entire trace that amount to center it back on 0.

We can also perform this same operation for all traces, and find the average DC shift for all traces if we'd rather do it in bulk:

In [None]:
print(np.nanmedian(np.array(rawGPRData.data)[:30, :]))

This should be about -216.

When you are processing GPR data, you can often choose to do a trace-by-trace analysis (which may take longer) or do it all as one.

In this case, let's do it trace-by-trace:

In [None]:
# DC SHIFT CORRECTION

# First, copy the data to retain our old data as a separate variable
import copy
mygpr_dcshift = copy.copy(rawGPRData)

#import numpy
import numpy as np

# Set the number of samples to check for the DC offset
number_of_samples = 30

# Initialize a list that we will add shifted data to
updatedData = []

# Loop through each trace
for traceNo in range(mygpr_dcshift.data.shape[1]):
    # Convert each trace to a flattened (1D) array for easier data manipulation
    currentTrace = np.array(mygpr_dcshift.data[:, traceNo]).flatten()

    # Get the portion of the trace we will use for calculating dc shift
    pre0trace = currentTrace[:number_of_samples]
    dcshiftValue = np.nanmean(pre0trace) # Calculate the mean (nanmean still works if there are any missing data)

    # Shift the trace by subtracting the dc shift value for the pre-0 time portion from the entire trace
    shiftedTrace = currentTrace - dcshiftValue

    # Add the shifted data to our list of data
    updatedData.append(shiftedTrace)

# Reshape our list to the same type, size, and shape of the original data
shiftedGPRData = np.matrix(np.array(updatedData).T)

# Before we updated our previous dataset, let's plot the two next to each other to see if we can see the difference
oldTrace100 = rawGPRData.data[:, 100]
newTrace100 = shiftedGPRData[:, 100]

fix, ax = plt.subplots()
# Plot a line at 0
plt.axhline(y=0, linestyle='dotted', c='k')

# Plot the old and new traces and zoom in on 0
plt.plot(oldTrace100, c='r', linewidth=1, linestyle='dotted', label='Original Data')
plt.plot(newTrace100, c='k', linewidth=0.5, label='DC Shifted Data')
plt.ylim([-1000,1000])
plt.legend(loc='upper left')

Make sure you set the mypgr.data property equal your newly shifted data!!!!

In [None]:
mygpr_dcshift.data = shiftedGPRData

# Q2: Explain *IN YOUR OWN WORDS* what the DC Shift correction step does
### Include the plot with the DC shifted and non-shifted data for trace 100 that you generated in the code cell that starts with "# DC SHIFT CORRECTION"

# Zero Time

Before we get into the processing of the actual data, we need to determine where our usable data actually begins in the record.
The measurement window is time-based (that is, we measure each trace for a certain amount of time before moving on to the next).
The measurement window starts as soon as the transmitting antenna begins to emit signal.
However, it takes some time before a) the direct signal reaches the receiving antenna, or b) the signal that reflects of the (sub)-surface reaches the receiving antenna.
We would like to determine the depth below the ground surface (not necessarily the antenna location), so any time before the initial signal comes back to the receiving antenna can be interpreted as the time it takes for the signal to reach the ground surface.
We want our time t=0 ns (our "zero time") to be when the signal begins propogating down from the surface, not necessarily when it begins propogating from the antenna.

The way that the we correct for this is to remove all the samples before this "Zero time" from the dataset that we will use for further processing.
To do this, zero-time correction removes some of the first samples of the GPR signal.

In GPRpy, we use the `setZeroTime()` method. For the method, the only parameter that must be set is new zero time we would like to set.

We will set this time manually in gprpy, which is one simple way to do zero time correction. 

In other software, the zero time can be calculated automatically using one of the following methods (either on a per-trace basis, or by calculating the average value over an entire profile):
* The first time the signal intensity returned exceeds a specified threshold
* The first local maximum in the signal intensity measured (either positive or negative peak)

Before we do anything, let's look at a plot of our data

In [None]:
mygpr_dcshift.showProfile()

Look at the Y-axis and note where 0 is (hint: it is not at the top of the chart).

When we read our data in, gprpy used the profile's metadata (as recorded in the "TZ_at_pt" value of the `mygpr_dcshift.info` property) to determine where to set the 0 value.

This value is determined by the equipment itself, but may or may not be correct (or may not be included in the metadata with some instruments).

First, let's check again and see if that is reasonable (i.e., if the signal begins to arrive around that point)

In [None]:
trace100 = mygpr_dcshift.data[:, 99]
plt.plot(trace100)
plt.axhline(0, c='k', linestyle='dotted', linewidth=1)
plt.xlim([30,70]) # Zoom in on the arrival sampleplt.xlim([30,70]) # Zoom in on the arrival sample
plt.show()

The value for `TZ_at_pt` in `mygpr_dcshift.info` is the sample number, and should correlate to the value on your x-axis in the plot above.

# Q3: Does the value for `TZ_at_pt` in `mygpr.info` seem reasonable as a starting time?
### Hint: Do we first start to see data coming into our GPR unit at about that sample number
### Include the plot of the trace from the previous cell in your response

But we do not know what the actual time is without doing some calculations.

To determine the amount of time covered by each sample, we need to divide the total time window by the number of samples (i.e., the number of points per trace).

Using the values in `mygpr_dcshift.info` calculate the Zero time by dividing the total time window (this will be given in nanoseconds) by the number of points per trace.

**HINT**: You can access each value using square brackets. For example, to use the `TZ_at_pt` value, use: `mygpr_dcshift.info['TZ_at_pt']`

In [None]:
# Calculate the amount of time per sample here:


# Q3: How much time does each sample cover? 
### (i.e., what is the temporal "sample spacing" of our measurement)

Now, use your answers from question 1 (the number of samples until the time zero) and question 3 (the amount of time for each sample) to calculate the amount of time in nanoseconds until time zero:

In [None]:
# Calculate the amount time into the measurement until we reach time zero


# Q4: How long into the measurement (in nanoseconds) does it take for us to reach "time zero"

Now, let's play with the zero time in the cell below.

First, we will perform a "shallow copy" on `mygpr_dcshift` each time we run the cell so that when we change the zero time, it does not change it for our original `mygpr_dcshift` dataset (just the `sandboxGPRData` that we are playing with)

Then, set the variable `zero_time_in_nanoseconds` to whatever value you want.

You might start by setting it equal to the your answer to Q4 * -1 (that is, "undo" the original zero time correction from when we read in the data) and see if that looks correct!

In [None]:
# Copy our data so we do not overwrite it as we are playing with it
import copy
sandboxGPRdata = copy.copy(mygpr_dcshift)

zero_time_in_nanoseconds = 
sandboxGPRdata.setZeroTime(zero_time_in_nanoseconds)
sandboxGPRdata.showProfile()

# Q5 Explain how the "data" that is removed by Zero time removal differs from the rest of the dataset
### (i.e., why do want to remove the data prior to the zero time?)

# Background Subtraction

One of the most important steps in terms of removing noise for GPR data is Background Subtraction.

When we acquire GPR data, there are certain "static" reflectors, or objects that the GPR signal will always bounce off of at every trace:
* Though not a "reflector", the "air wave" (the direct pulse from transmitter to receiver) is one of these. This will occur at essentially the same time on every trace.
* The first reflection off the ground back to the receiver will always occur at about the same time
* Any reflections off the internal housing, battery, shielding, etc. will also always occur at more or less the same time

Unfortunately, these are some of the strongest signal returns in our raw GPR dataset. 

Fortunately, they are quite easy to remove because they are static.

Background subtraction removes the average value from each sample across all traces (or across a pre-set number of traces).

That is to say, it takes the value of the first sample for every trace, averages it, then subtracts that value from each trace. It does that for each sample.

For our dataset, then it would repeat that process 232 times (the number of samples).

There are essentially two ways we can alter the background subtraction method:
* We can use the mean or median value (usually median is used)
* We can use all the traces to calculate the average value that will be subtracted, or just x number of traces around each trace (usually, all traces are used)
    * Sometimes, you can lose important geologic information by using the entire record/profile, so you can retain a little more information (as desired)

Background subtraction is the first processing step where the results are almost always immediately visible.

Run the cell below and plot the results. You can set the `ntraces` parameter equal to the number of traces to use as the averaging window. Anything greater than 4684 (in our case) will be the same as using the entire record.

Feel free to change the `ntraces` parameter to see how different values affect the resulting data (if you use a value <4684, note that in your answer to Q6).

In [None]:
import copy
mygpr_dc_bs = copy.copy(mygpr_dcshift)

mygpr_dc_bs.remMeanTrace(ntraces=5000)
mygpr_dcshift.showProfile()
plt.show()
mygpr_dc_bs.showProfile()

# Q6: Describe in your own words the visual differences in the profile plot before and after Background Subtraction
### Include the plot of the data after background subtraction in your exercise.
### The cell above will plot the data before background substraction first, then the data after background subtraction

# Apply Gain

Background subtraction changes the appearance of the profile to look more like a the variegated results we might expect to see from a complex subsurface. However, we still cannot see much data below a few nanoseconds.

This is because the intensity of reflected data attenuates rather quickly (even in more ideal media than what we are using in this exercise). However, even though we cannot see the data at first glance, there is much usable data in our dataset that we can still recover.

The primary way to do this is by applying gain. Gain is essentially "turning up the volume" of the GPR signal. However, the data at the surface already has a high amplitude, and the deeper data has much lower amplitude. 

In order to recover a clean-looking dataset with features that are definable, we need to apply gain variably to different parts of the GPR record.

There are many ways to do this:

* **Constant**: a constant gain to all parts of the GPR data
    * This is not usually done except when even the first returns are relatively low amplitude (for example, in GPR acquired from an airplane over ice sheets)
* **Linear**: linearly increasing gain
    * This is also not used often because the attenuation of the signal does not tend to be linear
* **Power**: raises the time of a sample to a specified power
    * $x(t) * t^\alpha$
        * $x(t)$ is the GPR data value at a specified time
        * $t$ is that time
        * $\alpha$ is the value specified by the user
* **Exponential**
    * $x(t) * e^{\alpha * t}$ (same values as in the power method)
* **Automatic Gain Control (AGC)**: Normalizes the data values of the signal over a spcified window width
    * Takes a window along each trace, calculates an average (can be RMS, mean, or median). Multiples the data value at the center of the window ($x(t)$) times the inverse of the average
    * $x(t) * \frac{1}{Avg_{window}}$
    * AGC is probably the most commonly used gain method in seismic geophysics
* **Manual**: in some programs, you can manually create a depth/gain curve


Only Power and AGC gain are supported natively in GPRpy.

By default, we will use AGC gain (though a commented out line in the cell below also shows how to do this using the power gain as well if you would like to do that).

Feel free to play with the AGC window size or the power for tpowGain. Run the cell below and continue processing the data.

In [None]:
import copy
mygpr_dc_bs_gain = copy.copy(mygpr_dc_bs)

mygpr_dc_bs_gain.agcGain(window=10)
#mygpr_dc_bs_gain.tpowGain(power=2)
mygpr_dc_bs.showProfile(yrng=[0,70])
plt.show()
mygpr_dc_bs_gain.showProfile(yrng=[0,70])


# Set Velocity and FK Migration

Setting a propogation velocity allows GPR technicians to transform the Y axis from time to depth. It also allows us to carry out migration, which is a useful operation for making the GPR profile look more like an "image" of the subsurface and removes many of the GPR-specific artifacts.

With commercial software, the subsurface propogation velocity is usually determined using "hyperbola fitting". When there are discrete objects in the ground, the GPR data as it goes over the object will create a hyperbola.

The shape of hyperbola (i.e., its width) can be related to a propogation velocity. Commercial software usually has a way to match a hyperbola with an overlay on the GPR profile that can be adjusted to different velocities so you can visually match hyperbolas created by the GPR data in the subsurface with pre-set hyperbolas. The hyperbola that best matches the data is usually used as the velocity. Some software also enable setting different velocities for different parts of the profile.

Most of the open-source GPR software allows setting the velocity, but may not have hyperbola matching. If you are using GPRpy on your local computer and want to try the Graphical User Interface (see [here](https://nsgeophysics.github.io/GPRPy/) for instructions on how to do that), there is a way to define the location and velocity of a hyperbola and have it chart on top of your data (this is nice, but not as convenient as some of the commercially-available software).

Otherwise, you can often tell if you are generally correct in your velocity setting based on how well the migration works. Let's try this below.

First, let's calculate the depth of penetration of our dataset based on:
* A propogation velocity we specify (m/ns). For this site we will use 0.08 meters per nanosecond. 
    * In Illinois, a good starting point is usually 0.1 m/ns; less for clayier soils; more for sandier soils
* The total time window of our GPR data (ns)
* The zero time (in nanoseconds) we calculated in Q4

(enter these data into the cell below to calculate the depth of penetration. This will also be used to cut off the extra data from our chart in the next code cells)

In [None]:
propogation_velocity = 0.08 #m/ns
time_window = 
zero_time = 

total_depth_window = ((time_window - zero_time) * propogation_velocity) / 2
total_depth_window

# Q7: What did you calculate for your depth of penetration?

We will now set our propogation velocity in the dataset.

The primary difference you should notice between the data before we set our velocity and after we set our velocity is that now we have a Y-axis in depth (rather than time) units. Knowing the velocity also enables us to perform migration.

**Migration** is a processing step that attempts to correct for the geometric distortions that result from the way in which the data was acquired and the way in which the waveform travels through the subsurface. Migration is similar to inversion, in that we are attempting to figure out a subsurface model from data we have acquired. Migration has been described as moving the waves "background in time,...in effect pushing the waves backward and downward to their reflecting locations". [AAPG Wiki](https://wiki.aapg.org/Seismic_migration)

For example, the hyperbolae that are distinctive of GPR profile acquisition can be "collapsed" into a single point of high radar intensity using migration.

Migration is a very common step in both GPR processing and seismic data processing. For more information on various migration methods, see [this helpful presentation](https://indico.cern.ch/event/44081/sessions/173726/attachments/941031/1334497/1_2Mig_DMO.pdf) (or [here(https://esd.halliburton.com/support/LSM/GGT/ProMAXSuite/ProMAX/5000/5000_8/Help/promax/mig_overview.pdf)] in more depth, and as the source for the explanations here (along with [this source](https://wiki.aapg.org/Seismic_migration))).

We will use FK Migration (sometimes called "Stolt's migration) in this exercise, which is supported natively by GPRpy. Another common migration method is the Kirchhoff Migration method, and can be more robust in many situations. 
* FK Migration: converts time/space domain into a frequency/wavenumber domain using a fourier transform
    * Relatively fast
    * Accurate under simple assumptions (e.g., constant velocity)
* Kirchhoff Migration: calculates travel times of surfaces/objects analytically using geometry and velocity
    * Not as fast as FK
    * One of earliest forms of migration
    * Often uses finite-difference to "sum" intensity values based on how waves are expected to travel (under assumed/provided velocity)
    * Can accomodate velocity variation quite well

In [None]:
import copy
mygpr_dc_bs_gain_vel = copy.copy(mygpr_dc_bs_gain)

mygpr_dc_bs_gain_vel.setVelocity(velocity=propogation_velocity)
mygpr_dc_bs_gain_vel.showProfile(yrng=[0, total_depth_window])
plt.show()
mygpr_dc_bs_gain_vel.fkMigration()
mygpr_dc_bs_gain_vel.showProfile(yrng=[0, total_depth_window])
plt.show()

# Topography Correction

Topography correction is also sometimes called "static migration." Especially when you acquire data on a hill, the relationships between different reflectors may be distorted based on the geometry at which you acquired your data.

To account for this, we carry out a topography correction (or static migration).

You should have already calculated topography in the Preprocessing exercise. 
Here, we will use a preconfigured topography file in the "GPRSampleData" folder. 
The filepath for that folder in the Github Codespace is already entered in the cell below. 
**You will need to change the topoFilePath variable if you are not using Github Codespaces!**

In [None]:
# Copy file before performing topography correction
import copy
mygpr_dc_bs_gain_vel_topo = copy.copy(mygpr_dc_bs_gain_vel)

# Set filepath and copy
topoFilePath = r"/workspaces/GEOL451/GPR/GPRSampleData/LINE0_XYZ.csv"
mygpr_dc_bs_gain_vel_topo.topoCorrect(topofile=topoFilePath, delimiter=',')

# Show before and after topo correction
mygpr_dc_bs_gain_vel.showProfile(yrng=[0, total_depth_window])
mygpr_dc_bs_gain_vel_topo.showProfile(yrng=[0, total_depth_window])

# Enveloping and the Hilbert Transform

We can create "envelopes" of our GPR signal using the Hilbert Transform. A nice example of what this looks like in practice (using matlab) is [here](https://www.mathworks.com/help/signal/ug/envelope-extraction-using-the-analytic-signal.html).

Summarizing a [signal processing stack exchange post](https://dsp.stackexchange.com/questions/25845/meaning-of-hilbert-transform), the Hilbert Transform provides a "true (instantaneous) amplitude" (i.e., the "analytic signal") of the GPR wave form, rather than the actual amplitude of the waveform, which varies between positive and negative values. Using the magnitude of this analytic signal, we can create an "envelope" around the modulating GPR signal.

Enveloping allows us to "smooth over" waveform modulations that arise from properties of the wave (and not the subsurface) and has the added benefit of resulting in only positive values, which can be easier to use for certain types of analysis (e.g., object detection).
Enveloping in GPR data is commonly carried out by the signal processing algorithm known as the Hilbert Transform. You may see it in different software packages identifed either as "enveloping" or "hilbert transform" (or just "hilbert").

Enveloping via the Hilbert Transform is not natively supported in GPRpy, but it is a part of many other common python libraries. 
We can use those to carry out the hilbert transform on our data quite easily. 

We will use the [hilbert function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.hilbert.html) in the signal module of the scipy package for this.

>Note: scipy should already be installed in Github Codespaces, and may have been installed with gprpy. If it is not installed, you can install install scipy using `conda install scipy` or `pip install scipy`.

In [None]:
# Import the hilbert function from the scipy signal module
from scipy.signal import hilbert
import copy
import numpy as np
mygpr_dc_bs_gain_vel_topo_hil = copy.copy(mygpr_dc_bs_gain_vel_topo)

#Extract the GPR data as a 2D Numpy array
dataArray = np.array(mygpr_dc_bs_gain_vel_topo_hil.data)

# Carry out the hilber transform and get the absolute value (i.e, the "envelope")
# We will do this along axis 1 (i.e., for each individual trace, but in bulk)
# The absolute value removes the complex part of the data and gives us the envelope
dataHilbert = np.abs(hilbert(dataArray, axis=1))

# Set the data property of the gprpyProfile() object equal to the enveloped data
mygpr_dc_bs_gain_vel_topo_hil.data = dataHilbert

>Note: For this particular dataset, the hilbert transform does not add much visually to help us interpret the data.

Let's take a look at our good friend trace # 100 to see what this has done:

In [None]:
trace100_preHilbert = np.array(mygpr_dc_bs_gain_vel_topo.data[:, 100]).flatten()
trace100_postHilbert = np.array(mygpr_dc_bs_gain_vel_topo_hil.data[:, 100]).flatten()

plt.axhline(y=0, linestyle='dotted', linewidth=1)
plt.plot(trace100_preHilbert, label='Pre-Hilbert')
plt.plot(trace100_postHilbert, label='Post-Hilbert (envelope)')
plt.legend(loc='upper left')
plt.show()

Our data is not a simple sinusoidal signal, and there are not many subsurface objects to bring out, so the hilbert transform is not a particularly effective processing step in this case.

The most obvious effect of the enveloping is near the surface, from about sample 46 (our estimated time0) to about sample 80. 

Here we see large positive and negative fluctuations transformed to relatively high amplitude values. 

The effect is actually simpler to see further back in our processing, before we performed the AGC gain. Run the cell below to see how the hilbert transform "envelopes" our pre-gained dataset.

In [None]:
preGainedData = mygpr_dc_bs #Change this variable if your pre-gained data is called something different

trace100_preHilbert = np.array(preGainedData.data[:,100]).flatten()
trace100_postHilbert = np.abs(hilbert(trace100_preHilbert))

plt.axhline(y=0, linestyle='dotted', linewidth=1)
plt.plot(trace100_preHilbert, label='Pre-Hilbert')
plt.plot(trace100_postHilbert, label='Post-Hilbert (envelope)')
plt.legend(loc='upper left')
plt.show()

Plot the entire dataset before and after the hilbert transform

In [None]:
# Display before and after
mygpr_dc_bs_gain_vel_topo.showProfile(yrng=[0, total_depth_window])
plt.show()
mygpr_dc_bs_gain_vel_topo_hil.showProfile(yrng=[0, total_depth_window])
plt.show()

# Q8: Include the output from this last code cell (the GPR Data before the hilbert transform and after) as the final part of your homework

# Q9: Explain as best as you can **in your own words** how we have manipulated the GPR data (either in paragraph form or bulleted list) over the course of the exercies. Be sure to include all the steps.
### You ***will not*** be penalized for getting a wrong answer, as long as we can see you have put thought into it. 
### You ***WILL*** be penalized for providing an answer that is not your own.
* DC shift
* Background Subtraction
* Gain
* Velocity Setting
* FK Migration
* Static Migration (topography correction)
* Hilbert Transform