# Rolling Mean Processing 

This Jupyter Notebook is created for performing rolling mean processing on a  [spool](https://dascore.org/tutorial/concepts.html#:~:text=read%20the%20docs!-,Data%20structures,-DASCore%20has%20two) of distributed acoustic sensing (DAS) data. It uses [DASCore](https://dascore.org/) package and the ```lf_das.py``` script.


<svg width="100%" height="1">
  <line x1="0" y1="0" x2="100%" y2="0" style="stroke:rgb(0,0,0);stroke-width:2" />
</svg>


#### Notes: 
1. Before using this notebook, make sure you have included the ```lf_das.py``` script in the current directory with this notebook and successfully installed DASCore using ```pip``` (recommended) or ```conda```:
    ```python
    pip install dascore
    ```
    or
    ```python
    conda install dascore -c conda-forge
    ```   
2. Please find all supported I/O [here](https://dascore.org/#:~:text=specialized%20analysis/visualization.-,Supported%20file%20formats,-name).
3. This code is tested with Silixa iDAS sample data. For other datasets, please change scale_iDAS variable to change raw data in any units to strain rate.
4. The current version of DASCore (v0.0.13) can read SINTELA ONYX's and Halliburton's data format with no issue (both PRODML v.2.0).
 

Current DASCore version: 0.0.13 (tested)

Date: 09/07/2023


Contact: [Ahmad Tourei](https://github.com/ahmadtourei/)

ahmadtourei@gmail.com

In [None]:
# import libraries
import warnings
warnings.simplefilter('ignore')

import dascore as dc
import matplotlib.pyplot as plt
import numpy as np

from time import time
from dascore.units import s # s as seconds
from lf_das import _get_filename


### Get a spool of data to work on

In [None]:
# define data path (spool of data) and output folder 
data_path = '/mnt/h/data'
output_data_folder =  '/mnt/h/results'
output_figure_folder = '/mnt/h/figures'

# get the sorted spool of data form the defined data path (on first run, it will index the patches and subsequently update the index file for future uses)
sp = dc.spool(data_path).sort("time").update()

# print the contents of first 5 patches
content_df = sp.get_contents()
content_df.head()


### Get some metadata and define a sub spool (if needed)

In [None]:
# get sampling rate, channel spacing, and gauge length from the first patch
patch_0 = sp[0]
gauge_length = patch_0.attrs['gauge_length']
print("Gauge length = ", gauge_length)
channel_spacing = patch_0.attrs['d_distance']
print("Channel spacing = ", channel_spacing)
sampling_interval = patch_0.attrs['d_time']
print("Sampling interval = ", sampling_interval)
sampling_rate = 1/(sampling_interval / np.timedelta64(1, 's'))
print("Sampling rate = ", sampling_rate)

# select sub spool
t_1 = '2023-03-22 03:00:00'
t_2 = '2023-03-22 07:00:00'
ch_start = 400
ch_end = 1400
d_1 = patch_0.coords['distance'][ch_start] # in meter
d_2 = patch_0.coords['distance'][ch_end] # in meter
sub_sp = sp.select(distance=(d_1, d_2), time=(t_1, t_2))


### Apply the rolling mean function and save the results in [DASDAE](https://dascore.org/api/dascore/io/dasdae/core/DASDAEV1.html) format

In [None]:
# define the target sampling interval (cutoff_freq = 1/(2*d_t)) 
d_t = 10.0

# determine window size in sec.
window = d_t*s

# determine step size in sec.
step = d_t*s

# define the scale to apply to the raw data
scale_iDAS = float((116*sampling_rate/gauge_length)/1e9)

# apply the rolling function on each patch in a for loop
for i, patch in enumerate (sub_sp):
    print ("working on patch ", i)
    # apply rolling mean function
    rolling_mean_patch = patch.rolling(time=window, step=step, engine="numpy").mean()
    # scale data
    new_scaled_patch = rolling_mean_patch.new(data=rolling_mean_patch.data*scale_iDAS) 

    # save the result to output folder
    filename = _get_filename(new_scaled_patch.attrs['time_min'],
                new_scaled_patch.attrs['time_max'])
    filename = output_figure_folder + '/' + filename 
    new_scaled_patch.io.write(filename, "dasdae") 
    

### Visualize results

In [None]:
# get the spool results and merge it
rolling_spool = dc.spool(output_data_folder).chunk(time=None)
# get the patch out of spool
rolling_merged_patch = rolling_spool[0]
# get the data out of patch
rolling_merged_patch_data = rolling_merged_patch.data

# define time axis
n_samples = rolling_merged_patch_data.shape[0]
num_sec = int(n_samples*d_t)
time = np.linspace(0, num_sec, n_samples, dtype=np.float64, endpoint=False)
# drop time associated with nan values
time[np.isnan(rolling_merged_patch_data[:, 0])] = np.nan
time_no_nans = time[~np.isnan(time)]

# drop nan values of rolling mean results (same behavior as Pandas)
rolling_merged_patch_no_nans = rolling_merged_patch.dropna("time")

# make sure time and rolling mean results have the same shape
assert time_no_nans.shape[0] == rolling_merged_patch_no_nans.data.shape[0]

In [None]:
plt.figure(figsize=(12,8))

# define the channel of interest
channel = 1330
ch_inx = channel - ch_start

plt.plot(time_no_nans, rolling_merged_patch_no_nans.data[:, ch_inx], label='channel: ' + str(channel))

plt.legend(loc='best')
plt.ylabel('Strain rate (1/sec)')
plt.xlabel('Time (sec) \n (4 hours, starting from 2023/03/22 03:00:00 UTC)')
# plt.ylim(-3e-13, 3e-13)
plt.title('Rolling mean results')
plt.grid(True)

file_name_lowfreq = '/rolling_mean_' + str(int(d_t*2)) + 'sec_sample_interval_channel' + str(channel) + '.jpeg'
plt.savefig(output_figure_folder + file_name_lowfreq, dpi=600, format='jpeg')
plt.show()
