# Preprocessing EM Data

We can store and process EM in two different file formats. The first is NumPy array data format. We get this file format when we capture EM data from a GNURadio Companion flowgraph through ZMQ sockets. The second file format is the cfile format. We get this file format when we save EM data using File Sink block within the GRC flowgraph. In NumPy file format, we can simply load and handle data as an ordinary NumPy array. However, for cfile format files, we have to use **getData()** function from the **emvince** library that reads the cfile and convert it into a NumPy array. Once converted, we can continue manipulating the data from a cfile just like the data from NumPy file.

In this Jupyter-Notebook, we will explore how to manipulate EM data using the API functions that are coming from two different file formats. 

- We are given two data files.
    1. A NumPy data file - 3.hackrf-data.npy
    2. A cFile data file - 3.hackrf-data.cfile
- First, we will load and plot the data from the two files separately.
- Then we will look at extracting only a segment of data from the files.

### 1. Importing required libraries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
#from scipy import signal
from emvincelib import iq, ml, stat

%matplotlib inline

### 2. Setting the configurations

The sampling rate of the provided data files are both 10MHz. Therefore, we set it first.

In [None]:
iq.sampleRate = 10e6

### 3. How big the files are?

In [None]:

file1 = "./data/preprocessing-em-data/hackrf-data.npy"
file2 = "./data/preprocessing-em-data/hackrf-data.cfile"


duration1 = iq.getTimeDuration(file1, fileType="npy")

print("Time duration of the numpy file: " + str(duration1) + " seconds")

duration2 = iq.getTimeDuration(file2, fileType="cfile")
                                       
print("Time duration of the cfile file: " + str(duration2) + " seconds")

### 4. Loading the entire data into memory

In [None]:
data_npy= np.load(file1, mmap_mode='r')
length = len(data_npy)
print("Number of samples in numpy data: " + str(length))

data_cfile= iq.getData(file2)
length = len(data_cfile)
print("Number of samples in cfile data: " + str(length))

#### Here's a more unified way to load data

In [None]:
data1 = iq.getSegmentData(file1, 0, duration1, fileType='npy')
length = len(data1)
print("Number of samples in numpy data: " + str(length))

data2 = iq.getSegmentData(file2, 0, duration2, fileType='cfile')
length = len(data2)
print("Number of samples in cfile data: " + str(length))

### 5. Plotting data


#### Waveform Plot

In [None]:
iq.plotWaveform(data1)

In [None]:
iq.plotWaveform(data2)

#### Scatter Plot

In [None]:
iq.plotScatter(data1)

In [None]:
iq.plotScatter(data2)

#### FFT Plot

In [None]:
iq.plotFFT(data1)

In [None]:
iq.plotFFT(data2)

#### PSD Plot

In [None]:
iq.plotPSD(data1)

In [None]:
iq.plotPSD(data2)

#### Spectrogram Plot

In [None]:
iq.plotSpectrogram(data1)

In [None]:
iq.plotSpectrogram(data2)

### 6. Extracting Smaller Segments of Data

When the sampling rate is higher and the sampling time period is longer, we end up with large data files. So, sometimes we just need a smaller segment from an EM trace.

#### A segment from NumPy file

Let's say we want to extract 5 milliseconds long segment of data starting from the time offset 2 milliseconds from **3.hackrf-data.npy** file.

Sample rate: 10MHz
Time period to skip: 2 milliseconds
Time period to extract: 5 milliseconds

In [None]:
offset = 2e-3
duration = 5e-3
data1_segment = iq.getSegmentData(file1, offset, duration, fileType='npy')

length = len(data1_segment)
print("Number of samples in NumPy data segment: " + str(length))
iq.plotSpectrogram(data1_segment)

#### A segment from cFile file

Let's say we want to extract 5 milliseconds long segment of data starting from the time offset 2 milliseconds from **3.hackrf-data.cfile** file.

We can directly do that with the **getSegmentData()** function in **emvince** library. It takes three parameters. The first is the cfile name. The second and third are the time offset and time segment size in seconds.

In [None]:
offset = 2e-3
duration = 5e-3
data2_segment = iq.getSegmentData(file2, offset, duration, fileType='cfile')

length = len(data2_segment)
print("Number of samples in cfile data segment: " + str(length))
iq.plotSpectrogram(data2_segment)