# Refining survey metadata using waveglider positions
This notebook helps a user refine the start and stop times of different surveys patterns using existing survey metadata. If no survey metadata exists, please use the `generate_survey_metadata` notebook instead.
***
## 1. Import Packages

In [None]:
import os
from pathlib import Path
import json
import pandas as pd
from datetime import datetime, timedelta
import tiledb
import numpy as np
from matplotlib import pyplot as plt, colors as mcolors, cm, dates as mdates
from typing import List

# ES-SFGTools imports
from es_sfgtools.processing.pipeline import DataHandler
from es_sfgtools.utils.archive_pull import list_campaign_files
from es_sfgtools.utils.metadata.site import import_site, TopLevelSiteGroups, SubLevelSiteGroups
from es_sfgtools.utils.metadata.campaign import Survey

***
## 2. Data Preparation and Archive Retrieval

### 2.1 Set survey parameters

1. Input survey releated metadata including...
* **network name** - name of network the site belongs to
* **site name** - 4 character site name
* **campaign name** (e.g YYYY_A_VESS) - check the metadata file to ensure you have the correct campaign name
* **vessel type** (e.g SV2 or SV3) - this type will determine which pipeline to run

2. Set the local data directory you wish to use to store campaign files, logs, and tileDB arrays. It will be created if it does not yet exist. Under this directory, a `campaign_plots` directory will also be created for any plots we generate in this notebook. 

In [None]:
# Input survey parameters
network = 'cascadia-gorda'
site = 'NCC1'
campaign_name = '2024_A_1126'   
vessel_type = 'SV3'             # SV2 or SV3 is supported

# Set data directory path for local environment
directory = './data/sfg'

# ------------------------------------- #
# This will create the data directory if it does not exist
data_dir = Path(f"{os.path.expanduser(directory)}")
os.makedirs(data_dir, exist_ok=True)

# Add local survey plots directory
plots_dir = directory + '/campaign_plots'
plots_dir = Path(f"{os.path.expanduser(plots_dir)}")
os.makedirs(plots_dir, exist_ok=True)
print(f"Plots directory: {plots_dir}")

### 2.2 Load and inspect existing metadata

Please enter the location of your **metadata file** to import data. Campaign start and end dates will be saved for setting up the `DataHandler` class in the next step.

*If no campaign is displayed, ensure you have the correct campaign name in the cell above and re-run if neccessary. If no campaign exists in the metadata, use the **generate_survey_metadata** notebook instead.*



In [None]:
# Enter the path to the metadata file
metadata_uri = "./site_vessels/NCC1.2025-03-20.json"    

# ---------------------------------- #
# Load and inspect existing metadata
print(f"Loading site metadata from {metadata_uri} ... \n")
site_metadata = import_site(metadata_uri)

# Get the campaign set above and add the surveys to a list we will use later
print(f"Searching for campaign: {campaign_name} ... \n")
surveys = []
for campaign in site_metadata.campaigns:
    if campaign.name == campaign_name:
        start = campaign.start
        end = campaign.end
        print(f"  Campaign: {campaign.name} \n   Start: {start} \n    End: {end}\n")

        for survey in campaign.surveys:
            print(f"  Survey: {survey.id} \n   Start: {survey.start} \n    End: {survey.end}")
            surveys.append(survey)

        break

if not surveys:
    print(f"No surveys found for campaign: {campaign_name} \n, please use generate_survey_metadata notebook to create surveys")

### 2.3 Set up the data handler class with our survey parameters

This will set up a `DataHandler` class using our data directory and site related information we set above. By doing this, the data handler will populate our data directory with the necessary folders and tileDB arrays. This cell also grab the correct pipeline for the vessel type specified (SV2/SV3).

In [None]:
# Set up the DataHandler class
data_handler = DataHandler(directory=data_dir) 

# Set the survey parameters
data_handler.change_working_station(network=network, 
                                    station=site, 
                                    campaign=campaign_name, 
                                    start_date=start.date(),    # Start date of the campaign (time not accepted)
                                    end_date=end.date())        # End date of the campaign (time not accepted)

if vessel_type == 'SV3':
    pipeline, config = data_handler.get_pipeline_sv3()
elif vessel_type == 'SV2':
    pipeline, config = data_handler.get_pipeline_sv2()
else:
    raise ValueError(f"Vessel type {vessel_type} not recognized")

### 2.4 Get the acoustic (DFOP00) files from the EarthScope archive

This will go to the EarthScope archive and list files for the specific campaign. The `DataHandler` class will then add this remote file list to the catalog stored within our data directory set above. It will then download only the DFOP00 files to our data directory. 

*If you have already run this cell prior, it will likely not need to download the files again.*

In [None]:
# Get DFOP00 file list from the archive
remote_filepaths = list_campaign_files(network=network, station=site, campaign=campaign_name)

# Add the data to the data handler
data_handler.add_data_remote(remote_filepaths=remote_filepaths)

# Download the dfop00 files, if not already downloaded (override=False)
data_handler.download_data(file_types='dfop00', override=False)

### 2.5 Read DFOP00 files into shotdata array

Using the SV pipeline chosen above, we will create an array from the DFOP00 files.

In [None]:
# Read DFOP00 files into shotdata array
config.dfop00_config.override=True          # Flag to override existing data
pipeline.config = config
pipeline.process_dfop00()

### 2.6 Convert data from array into a dataframe

Using the shotdata we just stored in a tileDB array, we will create a dataframe to use in the notebook going forward. 

In [None]:
## Set the shotdata array URI
shotdata_uri = f"{directory}/{network}/{site}/TileDB/shotdata_db.tdb"

#  Get the start and end dates of the campaign
campaign_start = data_handler.date_range[0]
campaign_end = data_handler.date_range[1]

# Read data from the TileDB array
print(f"Reading dataframe from {shotdata_uri} for {campaign_start} to {campaign_end}")
with tiledb.open(shotdata_uri, mode="r") as array:
    shot_data_dataframe = array.df[slice(np.datetime64(campaign_start), np.datetime64(campaign_end)),:]

# Show preview of the data
shot_data_dataframe

***
## 3. Plot Waveglider Locations

Now that we have our data ready, we can begin plotting waveglider locations and refining our surveys.

### 3.1 Set plotting functions
We will use these 2 plotting functions to refine our surveys below. Run the next cell to set these functions. If you wish to change how the plots look, this is where you would be able to do that. 

In [None]:
def plot_en(df, surveys: List[Survey] = [], save_as: str = None):
    """
    Plots the East and North positions over time for a given dataset, with survey periods highlighted.

    Parameters:
    - df: DataFrame containing the survey data with columns like 'triggerTime', 'east0', and 'north0'.
    - surveys: List of survey objects, each containing 'start', 'end', 'type', and optional 'notes'.
    - save_as: Optional string specifying the filename to save the plot. If None, the plot is not saved.
    """

    # Create a figure with two subplots (one for East, one for North)
    fig, axs = plt.subplots(nrows=2, figsize=(16,10))  

    # Set x and y axis labels
    axs[0].set_ylabel("East (m)")
    axs[1].set_ylabel("North (m)")

    # Scatter plot for East positions
    sc0 = axs[0].scatter(
        df["triggerTime"],
        df["east0"],
        alpha=0.25
    )
    # Scatter plot for North positions
    sc1 = axs[1].scatter(
        df["triggerTime"],
        df["north0"],
        alpha=0.25
    )

    # Generate a rainbow colormap for the surveys
    survey_colors = cm.rainbow(np.linspace(0, 1, len(surveys)))
    # Highlight survey periods on both subplots
    for ax in axs:
        for i, survey in enumerate(surveys):
            start = survey.start
            end = survey.end
            label = survey.type + " " + survey.notes if survey.notes else survey.type

            # Highlight the survey period with a colored span
            ax.axvspan(start, end, color=survey_colors[i], alpha=0.3, label=label)
        
        # Make ticks on occurrences of each month:
        ax.xaxis.set_major_locator(mdates.DayLocator())
        
        # Format the x-axis to display dates in 'month-day' format
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
    
    # Rotate x-axis labels 90 degrees
    plt.xticks(rotation=90)

    # Add a legnend to the first subplot
    axs[0].legend()

    # Save the plot to a file if a filename is provided
    if save_as is not None:
        plt.savefig(save_as)
        

def plot_wg_position(df, start: datetime, end: datetime, survey: Survey = None):
    """
    Plots the antenna position (East vs. North) for a specific survey within a given time range.

    Parameters:
    - df: DataFrame containing the survey data with columns like 'triggerTime', 'east0', and 'north0'.
    - survey_name: Name of the survey (string).
    - survey_type: Type of the survey (string).
    - start: Start time of the survey (datetime object).
    - end: End time of the survey (datetime object).
    """
    # Filter the DataFrame for the specified time range
    temp_df = df[df['triggerTime']>=start]
    temp_df = temp_df[df['triggerTime']<=end]

    # Create a figure and axis for the plot
    fig, ax = plt.subplots(figsize=(16,10))

    # Set the title of the plot and the filename for saving the figure
    if survey is not None:
        survey_name = survey.id
        survey_type = survey.type
        title = f"{survey_name} {survey_type} from {start.isoformat()} to {end.isoformat()}"
        save_as = f"{plots_dir}/{survey_name}_{survey_type}.png"
        fig.suptitle(title)
    else:
        title = f"Survey from {start.isoformat()} to {end.isoformat()}"
        save_as = f"{plots_dir}/survey_{start.isoformat()}_{end.isoformat()}.png"
        fig.suptitle(title)

    # Set the x and y axis labels
    ax.set_xlabel("East (m)")
    ax.set_ylabel("North (m)")

    # Optional: Plot the origin point (?)
    #ax.scatter(0, 0, label="Origin", color="magenta",s=100)

    # Convert trigger times to timestamps and scale them for colormap normalization
    colormap_times = temp_df["triggerTime"].apply(lambda x:x.timestamp()).to_numpy()
    colormap_times_scaled = (colormap_times - colormap_times.min())/3600

    # Normalize the colormap to the range of scaled times
    norm = mcolors.Normalize(
        vmin=0,
        vmax=(colormap_times.max() - colormap_times.min()) / 3600,
    )

    # Scatter plot of East vs. North positions, colored by time
    sc = ax.scatter(
        temp_df["east0"],
        temp_df["north0"],
        c=colormap_times_scaled,
        cmap="viridis",
        label="Antenna Position",
        norm=norm,
        alpha=0.25
    )

    # Add a colorbar to indicate time in hours
    cbar = plt.colorbar(sc,label="Time (hr)",norm=norm)

    # Add a legend to the plot
    ax.legend()

    # Save the plot to a file
    plt.savefig(save_as)

    # Print the start and end times of the filtered data
    print(temp_df.triggerTime.iloc[0].isoformat(), temp_df.triggerTime.iloc[-1].isoformat())


### 3.2 View the surveys available

If you wish to review the surveys available, run the next cell. You will need to survey IDs to plot individual surveys. The typical ID is in the format `CAMPAIGN_NAME_INTERVAL`.

In [None]:
print(f"There are {len(surveys)} surveys in the campaign {campaign_name} \n")
for survey in surveys:
    print(survey.model_dump_json(indent=2))

### 3.3 Plot the full campaign with all surveys

This next cell takes the start and end date of the shot data dataframe and plots the east and north positions over time as well as the waveglider map positions. If surveys already exist within the metadata, they are shown and labels on the east/north plots.


In [None]:
# Adjust to remove any junk data at beginning or end if needed
start = shot_data_dataframe.triggerTime.iloc[0] + timedelta(hours=0, minutes=0)
end = shot_data_dataframe.triggerTime.iloc[-1] - timedelta(hours=0, minutes=0)

# Filter the DataFrame for the specified time range
temp_df = shot_data_dataframe[shot_data_dataframe['triggerTime'] >= start]
temp_df = temp_df[shot_data_dataframe['triggerTime'] <= end]

# Plot the East and North positions over time
plot_en(temp_df, surveys)

# Plot the waveglider position
plot_wg_position(shot_data_dataframe, start, end)

***
## 4. Refine surveys
In this section we will begin refining our surveys. It may be run through multiple times as needed. 

### 4.1 Plot individual survey

Now that we have plotted the full campaign, lets narrow down on an individual survey. 

In [None]:
# Enter the the survey ID to plot - typically in the format of `campaignName_interval`, 
# for example `2024_A_1126_1`. Check the surveys above for the correct ID.
survey_id = ""

# ----------------------------- #
# Get the correct survey object
survey = next((survey for survey in surveys if survey.id == survey_id), None)

if not survey:
    raise ValueError(f"Survey {survey_id} not found in the list of surveys.")
else:
    print(f"Plotting survey {survey_id} ...")

# Plot the East and North positions over time for the selected survey
plot_en(temp_df, [survey])

# Plot the waveglider position for the selected survey
plot_wg_position(df=shot_data_dataframe, start=survey.start, end=survey.end, survey=survey)

### 4.2 Refine the original survey

In this next cell we have the option or add or delete time from the start and end times of the survey. Change the hours and minutes of the `timedelta` function for the `new_start` and `new_end` variables. The default is to add to the start time and subtract from the end time. If you wish to do differently, remember to change those +/-'s. 

Re-run this cell multiple times and view the plots to refine the survey times to your desired start and end times, then move on to the next cell to update the original metadata.

In [None]:
# Update the hours and minutes for the survey you want to refine the times of.
new_start = survey.start + timedelta(hours=0, minutes=0)
new_end = survey.end - timedelta(hours=0, minutes=0)

# -------------------- #
# View the plots associated with the new start and end times
plot_en(temp_df, [survey])
plot_wg_position(df=shot_data_dataframe, start=new_start, end=new_end, survey=survey)

### 4.3 Update original survey metadata

Running this next cell with update the survey metadata you refined above that we set in the beginning using the site metadata file. 

In [None]:
# Update the survey metadata, ID is required, updating start and end times
update_survey_data = {"id": survey.id, "start": new_start.isoformat(), "end": new_end.isoformat()}

# Update the survey metadata in the site metadata
site_metadata.run_sub_component(component_type=TopLevelSiteGroups.CAMPAIGNS, component_name=campaign_name,
                            sub_component_type=SubLevelSiteGroups.SURVEYS, sub_component_metadata=update_survey_data,
                            update=True)

# Save the updated site metadata to a new file
plot_en(temp_df, surveys, save_as=f"{site}_{campaign_name}_surveys.png")

# Reload the surveys to update the list with the refined times
surveys = []
for campaign in site_metadata.campaigns:
    if campaign.name == campaign_name:
        for survey in campaign.surveys:
            surveys.append(survey)

### 4.4 Refine another survey

If you haven't finished refining the surveys you want to refine, go back to 4.1, input a new survey ID and run through section 4 again. 

If you have finished, move on to section 5.


***
## 5. Finalize and Export Survey Data
 
### 5.1 View all survey maps

This cell will go through each survey and plot the waveglider positions for each one. 

In [None]:
for individual_survey in surveys:
    plot_wg_position(df=shot_data_dataframe, start=individual_survey.start, end=individual_survey.end, survey=individual_survey)

### 5.2 Export updated site metadata to file

This will export the new metadata to a file in the current directory. File name format is `[site].[refined_surveys].[date].json`.

In [None]:
SITE_4_CHAR_ID = site # 4 char site ID

if not SITE_4_CHAR_ID:
    raise ValueError("Please enter a 4 char site ID")

# Export site metadata to a json file
# Add date to the file name
date = datetime.now().strftime("%Y-%m-%d")
file_path = f"./{SITE_4_CHAR_ID + '.refined_surveys.' + date}.json"          # Export file path you wish to store
site_metadata.export_site(file_path)