# SpectralData class tutorial

This notebook is a simple demo to show how to use the SpectralData class in the prizmatoid module. This demo may change over demo as both the modules in the prizm-data-wrangling module change over time.

### Load the modules

comment: would be good to get to a state where we can add a setup.py file for easy install

In [44]:
import prizmatoid as pzt
import metadatabase as mdb

from matplotlib import pyplot as plt

try:
    reload(pzt)
    reload(mdb)
except:
    from importlib import reload
    reload(pzt)
    reload(mdb)

### Load Data from ctimes 152440000


Comments: 

- No longer need to call data/patch directory variable in mdb. Might be good to see if that's actually the best way.
- The class currently assumes you'll load data per antenna (and not two antennas simultaneously).

In [23]:
data_directory = '../prizm_data/'
patches_directory = './patches_data'


ctime_intervals = [(152440000, 152450000)]
components=['100MHz', 'switch']
filters=['polarization_0']

#Initialize an empty class that doesn't contain data
data = pzt.SpectralData()
#Access the load method to load data into the SpectralData class instance (without printing information)
data.load_data(data_directory=data_directory, patches_directory=patches_directory, 
               ctime_intervals=ctime_intervals, verbose=False)

No files named `open.scio` could be found and/or read.
No files named `open.scio` could be found and/or read.


### Start operating on the data

Comments:
- trim_flags: applies the trim on all flags (if they exist), is it useful to have different sized trims per flag type?
- find_entries: does what find_antenna_times and half of find_short100 do.
- find_data_chunks: like the above a generalisation.

In [38]:

#Generate the flags (all flags are an optional parameter and are by default set to false, except for switch_flags)
data.generate_flags(switch_flags=True)

#Trim all flags (with default setting, 1 leading timestamp and 1 trailing timestamp)
#Applied to all flags generated in the previous steps
data.trim_flags(trim = (1,1))

#Flag retrieval functions
short_indices = data.find_entries(switch_type='short')
start, end = data.find_data_chunks(switch_type='short')



### Analysis or Wrangling?

Below is a demo of the re-implementation of the get_short and the run_data functions + a vna_reading tool that is somewhat module-less 

In [49]:
import numpy as np
from scipy import signal
from scipy.interpolate import interp1d

def read_vna_data(filename, delimiter=",", encoding="ISO-8859-1"):
    """
    Reads VNA measurements for efficiency calculation.
    2017 - .csv with ; delimiters
    2018 - .csv with , delimiters
    2019 - .csv with , delimiters
    2020 - Non-existent
    2021 - .txt with \t delimtiers

    Data from 2018 are .csv with peculiarities handled by this file.
    Data from 2022 are .txt with formatting as expected

    Original function was written by Kelly A. Foran, adapted by Ronniy C. Joseph.

    Parameters
    ----------
    filename
    startkey
    delimiter
    encoding

    Returns
    -------

    Written by Kelly A. Foran
    Adapted by Ronniy C. Joseph
    """
    #Loop through file to figure out where the header starts, and how many data rows we're reading and create appropiate
    #arrays
    header_line, n_data_rows = find_start_and_end(filename)
    data = np.zeros((n_data_rows, 6))

    read_labels = False
    read_data = False
    data_counter = 0

    with open(filename, 'r', encoding=encoding , errors = 'replace') as datafile:
        for index, line in enumerate(datafile):

            #Read VNA Measurement after the Column Header was read, see below.
            if read_data:
                data[data_counter] = decode_line(line, delimiter=delimiter, n_columns = len(column_header))
                data_counter +=1

            #Record Header Labels after the newline was found, see below.
            if read_labels:
                column_header = line.strip().split(delimiter)
                read_labels = False
                read_data = True

            #Try and find the first newline if you haven't found it already
            if read_labels != True and read_data != True:
                tags = line.split(delimiter)
                try:
                    if tags[0] == '\n':
                        read_labels = True
                except IndexError:
                    pass
    return data


def find_start_and_end(filename, lookup="\n", encoding="ISO-8859-1"):
    """
    This function reads a .txt file and returns the line where it finds the look-up character, and computes how many
    lines to the end of file

    It was built to scan Nivek's S11 VNA measurement outputs and figure out where the data actually starts so you
    can ignore all the metadata

    parameters
    -----------
    filename: str
        path to text file

    lookup: str
        string that marks end of metadata

    encoding: str
        type of encoding (utf-8, etc.)

    """
    header_line = None
    with open(filename, encoding=encoding) as datafile:
        line_counter = 0
        for num, line in enumerate(datafile):
            if lookup == line:
                header_line = num + 1
            line_counter += 1
    try:
        n_data_rows = line_counter - header_line - 1
    except TypeError:
        print(f"Couldn't find look-up character {lookup}. Are you sure the file actually contains this? ")
        raise
    return header_line, n_data_rows


def decode_line(line, delimiter, n_columns):
    #Deals with formatting challenges across the various data sets
    #The challenge is that frequency formatting changes based on year an delimiter, pending on channel spacing
    #There are two cases
    #All columns have two entries (pre- and post-comma)
    #Or the Frequency channel only has 1 entry, because there is no decimal comma

    tags = line.strip().split(delimiter)
    empty_column = int((len(tags) - 1) / 2)
    data = np.zeros(n_columns - 1)
    tags = tags[:empty_column] + tags[(empty_column + 1):]
    counter = 0

    #print(tags)
    #2021 VNA data, tab delimited data lines up with number of column labels
    if len(tags) == 6:
        for s, string in enumerate(tags):
            data[s] = string.replace(",", ".")
        data = data.astype(float)

    # <2021 data where every column has been split into before and after decimal points
    elif len(tags) == 12:
        indices = np.array([0,2,4,6,8,10])
        for s in indices:
            data[counter] = tags[s] + "." + tags[s+1]
            counter += 1
        data = data.astype(float)

    #<2021 where frequencies are integers, but other columns have been split into two
    #TODO build a robuster check there might be the unlikely odd case the Magnitude/Phase columns are ints
    elif len(tags) == 10:
        indices = np.array([0,1,3,5,6,8])
        for s in indices:
            if s == 0 or s == empty_column :
                data[counter] = tags[s]
            else:
                data[counter] = tags[s] + "." + tags[s+1]
            counter += 1
        data = data.astype(float)
    else:
        raise Exception("A new case of VNA data formatting!")
    return data


def find_efficiency(freqs, s11_filename, xsmooth, delimiter='\t'):
    """ Finds the antenna efficiency from the antenna s11 only.
        Reads VNA measurements for efficiency calculation.

    2017 - .csv with ; delimiters
    2018 - .csv with , delimiters
    2019 - .csv with , delimiters
    2020 - Non-existent
    2021 - .txt with \t delimtiers

    Written by Kelly A. Foran
    Adapted by Ronniy C. Joseph

    """
    # read the s11 data
    s11_data = read_vna_data(s11_filename, delimiter=delimiter)
    s11_freqs = s11_data[:, 0]

    # convert the s11 from dB to linear
    s11_magnitude = 10 ** (s11_data[:, 1] / 20)

    # interpolate and smooth efficiency to requested frequencies
    #Define a Butterworth Smoothing kernel
    sos = signal.butter(1, xsmooth, btype='lp', output='sos')
    #Filter data and interpolate
    lin = interp1d(s11_freqs, signal.sosfilt(sos, s11_magnitude), kind='slinear', fill_value="extrapolate")
    efficiency = 1 - (lin(freqs)) ** 2

    return efficiency

In [50]:
vna_path = "../prizm_data/prizm_vna_2019/antenna_s11_v2/"
s11_ew = vna_path + '100-1-ew-s11.csv'
s11_ns = vna_path + '100-2-ns-s11.csv'


freqs = np.linspace(0, 250000000, num=4096, endpoint=True)
efficiency_smoothing = 0.02

#This is an adaption of find_efficiency1, but slightly more robust against the horror that is data from 2017
#But not quite suited for prizmatoid
efficiency_ew = find_efficiency(freqs, s11_ew, efficiency_smoothing, delimiter=',')
efficiency_ns = find_efficiency(freqs, s11_ns, efficiency_smoothing, delimiter=',')


In [51]:
power0, power1 = data.compute_power(eff_ew=efficiency_ew, eff_ns=efficiency_ns)