# Generating survey metadata using waveglider positions

This notebook helps a user locate the start and stop times of different surveys patterns when no survey metadata exists. If survey metadata already exists and you wish to refine start and stop times of existing metadata, please use the `refine_survey_metadata` notebook instead. 
***
## 1. Import Packages

In [None]:
import os
from pathlib import Path
import json
import pandas as pd
from datetime import datetime, timedelta
import tiledb
import numpy as np
from matplotlib import pyplot as plt, colors as mcolors, cm, dates as mdates
from typing import List

# ES-SFGTools imports
from es_sfgtools.processing.pipeline import DataHandler
from es_sfgtools.utils.archive_pull import list_campaign_files
from es_sfgtools.utils.metadata.site import import_site, TopLevelSiteGroups, SubLevelSiteGroups
from es_sfgtools.utils.metadata.campaign import Survey, campaign_checks

***
## 2. Data Preparation and Archive Retrieval
### 2.1 Set survey parameters

1. Input survey releated metadata including...
* **network name** - name of network the site belongs to
* **site name** - 4 character site name
* **campaign name** (e.g YYYY_A_VESS) - check the metadata file to ensure you have the correct campaign name
* **vessel type** (e.g SV2 or SV3) - this type will determine which pipeline to run

2. Set the local data directory you wish to use to store campaign files, logs, and tileDB arrays. It will be created if it does not yet exist. Under this directory, a `campaign_plots` directory will also be created for any plots we generate in this notebook. 

In [None]:
# Input survey parameters
network = 'cascadia-gorda'
site = 'NBR1'
campaign_name = '2023_A_1063'   
vessel_type = 'SV3'             # SV2 or SV3 is supported

# Set data directory path for local environment
directory = './data/sfg'

# ------------------------------------- #
# This will create the data directory if it does not exist
data_dir = Path(f"{os.path.expanduser(directory)}")
os.makedirs(data_dir, exist_ok=True)
print(f"Data directory: {data_dir}")

# Add local survey plots directory
plots_dir = directory + '/campaign_plots'
plots_dir = Path(f"{os.path.expanduser(plots_dir)}")
os.makedirs(plots_dir, exist_ok=True)
print(f"Plots directory: {plots_dir}")

### 2.3 Load and inspect existing metadata

Please enter the location of your **metadata file** to import data. Campaign start and end dates will be saved for setting up the `DataHandler` class in the next step.

*If a campaign and surveys already exist and you just wish to refine them, please use the **refine_survey_metadata** notebook instead.*



In [None]:
# Enter the path to the site metadata file you want to use and update
metadata_uri = "./site_vessels/NBR1.2025-03-19.json"    

# ---------------------------------- #
# Load and inspect existing metadata
print(f"Loading site metadata from {metadata_uri} ... \n")
site_metadata = import_site(metadata_uri)

campaign_check = False
start = None
end = None

# Check if the campaign already exists in the site metadata
surveys = []
for campaign in site_metadata.campaigns:
    if campaign.name == campaign_name:
        campaign_check = True
        start = campaign.start
        end = campaign.end
        print(f"  Campaign: {campaign.name} \n   Start: {start} \n    End: {end}\n")

        for survey in campaign.surveys:
            print(f"  Survey: {survey.id} \n   Start: {survey.start} \n    End: {survey.end}")
            surveys.append(survey)
        
        if len(surveys) > 0:
            print(f"Found {len(surveys)} surveys in {campaign.name} campaign. If you wish to refine these surveys instead, please use the refine_survey_metadata notebook.")

        break

if not campaign_check:
    print(f"Campaign {campaign_name} not found in site metadata. Please create the campaign metadata below.")
else: 
    print(f"Campaign {campaign_name} found in site metadata. Proceed to create more surveys below.")


### 2.3 Set up the data handler class with our survey parameters

This will set up a `DataHandler` class using our data directory and site related information we set above. By doing this, the data handler will populate our data directory with the necessary folders and tileDB arrays. This cell also grab the correct pipeline for the vessel type specified (SV2/SV3).

In [None]:
# Set up the DataHandler class
data_handler = DataHandler(directory=data_dir) 

# Set the survey parameters
data_handler.change_working_station(network=network, 
                                    station=site, 
                                    campaign=campaign_name) 
                                    # start_date=start.date(),    # Start date of the campaign (time not accepted)
                                    # end_date=end.date())        # End date of the campaign (time not accepted)

if vessel_type == 'SV3':
    pipeline, config = data_handler.get_pipeline_sv3()
elif vessel_type == 'SV2':
    pipeline, config = data_handler.get_pipeline_sv2()
else:
    raise ValueError(f"Vessel type {vessel_type} not recognized")


### 2.4 Get the acoustic (DFOP00) files from the EarthScope archive

This will go to the EarthScope archive and list files for the specific campaign. The `DataHandler` class will then add this remote file list to the catalog stored within our data directory set above. It will then download only the DFOP00 files to our data directory. 

*If you have already run this cell prior, it will likely not need to download the files again.*

In [None]:
# Get DFOP00 file list from the archive
remote_filepaths = list_campaign_files(network=network, station=site, campaign=campaign_name)

# Add the data to the data handler
data_handler.add_data_remote(remote_filepaths=remote_filepaths)

# Download the dfop00 files, if not already downloaded (override=False)
data_handler.download_data(file_types='dfop00', override=False)

### 2.5 Read DFOP00 files into shotdata array

Using the SV pipeline chosen above, we will create an array from the DFOP00 files.

In [None]:
# Read DFOP00 files into shotdata array
config.dfop00_config.override=True          # Flag to override existing data
pipeline.config = config
pipeline.process_dfop00()

### 2.6 Convert data from array into a dataframe

Using the shotdata we just stored in a tileDB array, we will create a dataframe to use in the notebook going forward. 

In [None]:
## Set the shotdata array URI
shotdata_uri = f"{directory}/{network}/{site}/TileDB/shotdata_db.tdb"

# Read data from the TileDB array
with tiledb.open(shotdata_uri, mode="r") as array:
    shot_data_dataframe = array.df[:]

# Show preview of the data
shot_data_dataframe

***
## 3.  Plot waveglider locations

Now that we have our data ready, we can begin plotting waveglider locations and refining our surveys.

### 3.1 Set plotting functions
We will use these 2 plotting functions to refine our surveys below. Run the next cell to set these functions. If you wish to change how the plots look, this is where you would be able to do that. 

In [None]:
def plot_en(df, surveys: List[Survey]=[], save_as: str = None):
    """
    Plots the East and North positions over time for a given dataset, with survey periods highlighted.

    Parameters:
    - df: DataFrame containing the survey data with columns like 'triggerTime', 'east0', and 'north0'.
    - surveys: List of survey objects, each containing 'start', 'end', 'type', and optional 'notes'.
    - save_as: Optional string specifying the filename to save the plot. If None, the plot is not saved.
    """

    # Create a figure with two subplots (one for East, one for North)
    fig, axs = plt.subplots(nrows=2, figsize=(16,10))  

    # Set x and y axis labels
    axs[0].set_ylabel("East (m)")
    axs[1].set_ylabel("North (m)")

    # Scatter plot for East positions
    sc0 = axs[0].scatter(
        df["triggerTime"],
        df["east0"],
        alpha=0.25
    )
    # Scatter plot for North positions
    sc1 = axs[1].scatter(
        df["triggerTime"],
        df["north0"],
        alpha=0.25
    )

    # Generate a rainbow colormap for the surveys
    survey_colors = cm.rainbow(np.linspace(0, 1, len(surveys)))
    # Highlight survey periods on both subplots
    for ax in axs:
        for i, survey in enumerate(surveys):
            start = survey.start
            end = survey.end
            label = survey.type + " " + survey.notes if survey.notes else survey.type

            # Highlight the survey period with a colored span
            ax.axvspan(start, end, color=survey_colors[i], alpha=0.3, label=label)
        
        # Make ticks on occurrences of each month:
        ax.xaxis.set_major_locator(mdates.DayLocator())
        
        # Format the x-axis to display dates in 'month-day' format
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
    
    # Rotate x-axis labels 90 degrees
    plt.xticks(rotation=90)

    # Add a legnend to the first subplot
    axs[0].legend()

    # Save the plot to a file if a filename is provided
    if save_as is not None:
        plt.savefig(save_as)
        

def plot_wg_position(df, start: datetime, end: datetime, survey: Survey = None):
    """
    Plots the antenna position (East vs. North) for a specific survey within a given time range.

    Parameters:
    - df: DataFrame containing the survey data with columns like 'triggerTime', 'east0', and 'north0'.
    - start: Start time of the survey (datetime object).
    - end: End time of the survey (datetime object).
    """
    # Filter the DataFrame for the specified time range
    temp_df = df[df['triggerTime']>=start]
    temp_df = temp_df[df['triggerTime']<=end]

    # Create a figure and axis for the plot
    fig, ax = plt.subplots(figsize=(16,10))

    # Set the title of the plot and the filename for saving the figure
    if survey is not None:
        survey_name = survey.id
        survey_type = survey.type
        title = f"{survey_name} {survey_type} from {start.isoformat()} to {end.isoformat()}"
        save_as = f"{plots_dir}/{survey_name}_{survey_type}.png"
        fig.suptitle(title)
    else:
        title = f"Survey from {start.isoformat()} to {end.isoformat()}"
        save_as = f"{plots_dir}/survey_{start.isoformat()}_{end.isoformat()}.png"
        fig.suptitle(title)

    # Set the x and y axis labels
    ax.set_xlabel("East (m)")
    ax.set_ylabel("North (m)")

    # Optional: Plot the origin point (?)
    #ax.scatter(0, 0, label="Origin", color="magenta",s=100)

    # Convert trigger times to timestamps and scale them for colormap normalization
    colormap_times = temp_df["triggerTime"].apply(lambda x:x.timestamp()).to_numpy()
    colormap_times_scaled = (colormap_times - colormap_times.min())/3600

    # Normalize the colormap to the range of scaled times
    norm = mcolors.Normalize(
        vmin=0,
        vmax=(colormap_times.max() - colormap_times.min()) / 3600,
    )

    # Scatter plot of East vs. North positions, colored by time
    sc = ax.scatter(
        temp_df["east0"],
        temp_df["north0"],
        c=colormap_times_scaled,
        cmap="viridis",
        label="Antenna Position",
        norm=norm,
        alpha=0.25
    )

    # Add a colorbar to indicate time in hours
    cbar = plt.colorbar(sc,label="Time (hr)",norm=norm)

    # Add a legend to the plot
    ax.legend()

    # Save the plot to a file
    plt.savefig(save_as)

    # Print the start and end times of the filtered data
    print(f"Start Time: {temp_df.triggerTime.iloc[0].isoformat()} \nEnd Time: {temp_df.triggerTime.iloc[-1].isoformat()}")


### 3.2 Plot & refine the full campaign

This next cell plots the entire shot data dataframe as well as the waveglider map positions. To alter..
1. Run the cell and view the full campaign via the plots below
2. Adjust the start and end dates using the +/- `timedelta` function within the start and end date variables.
3. Repeat as necessary to trim the campaign start and end dates


In [None]:
# Adjust to remove any junk data at beginning or end if needed
start = shot_data_dataframe.triggerTime.iloc[0] + timedelta(hours=0, minutes=0)
end = shot_data_dataframe.triggerTime.iloc[-1] - timedelta(hours=0, minutes=0)

# ----------------------- #
# Filter the DataFrame for the specified time range set above
temp_df = shot_data_dataframe[shot_data_dataframe['triggerTime'] >= start]
temp_df = temp_df[shot_data_dataframe['triggerTime'] <= end]
 
# Plot the East/North positions over time
plot_en(df=temp_df)

# Plot the waveglider position
plot_wg_position(df=shot_data_dataframe, start=start, end=end)

# Print total number of days in survey
print(f"Total number of days in campaign: {(end-start).days} days")

### 3.3 Optional: Create the campaign 

If the campaign we are working on does not exist in the metadata, we will need to create that first. If a campaign already exists, skip this section and start working on creating surveys. 

To create the campaign, make sure we have all of the required information below. `Name`, `Vessel Code`, `start` and `end` are variables we have already set above. Enter the `type` and any optional information if you know it. Run the cell and just the json output to ensure its correct. If incorrect, re-edit and rerun the cell. If correct, add the campaign to the site metadata below. 

In [None]:
campaign = {}
# ----------------------- Update these values if needed ----------------------- 

# -- Required information for new campaign --
campaign['name'] = campaign_name
campaign['vesselCode'] = campaign_name[-4:]  # Extract the last 4 characters of the campaign name
campaign['start'] = start.isoformat()         # Start date of campaign
campaign['end'] = end.isoformat()            # End date of campaign

campaign['type'] = ""           # type of campaign: deploy | measure 

# -- Optional: Enter information known about the people and vessels involved in the campaign --
campaign['launchVesselName'] = ""             # launch vessel name used in campaign
campaign['recoveryVesselName'] = ""           # recovery vessel name used in campaign
campaign['principalInvestigator'] = ""        # PI name 
campaign['cruiseName'] = ""                   # Name of cruise
campaign['technicianName'] = ""               # technician name
campaign['technicianContact'] = ""            # technician contact information (email/phone)


# ----------------------- Do not update code below ----------------------- 
print(json.dumps(campaign, indent=2))

#### Confirm output it correct and then run the next cell to add the campaign to the site metadata

In [None]:
# ----------------------- Do not update code below -----------------------
site_metadata.run_component(component_type=TopLevelSiteGroups.CAMPAIGNS, component_metadata=campaign, add_new=True)

In [None]:
# If you want to view the entire site metadata, you can print it out
site_metadata.print_json()

*** 
## 4. Create surveys
In this section we will begin creating surveys. It may be run through multiple times as needed. 

### 4.1 Plot individual survey
Using the campaign start and end times defined above, use the `timedelta` function to edit days, hours and mintutes to narrow down on a survey. Each time you run the cell, a new plot will be generated to show you the east and north positions as well as the waveglider map position. 

Be sure to update the survey interval on each subsequent survey created, otherwise you will get an error when trying to add the survey to the metadata. If a survey already existed prior to running this notebook, you may need to start with a higher number initially. Check the survey IDs in the metadata if so. 

In [None]:
# Change the survey interval to the next number on each subsequent survey creation
survey_interval = 1

survey_start = start + timedelta(days=0, hours=1, minutes=0)
survey_end = end - timedelta(days=6, hours=6, minutes=0)

# Create a new survey object
survey = Survey(
    id=campaign_name + "_"  + str(survey_interval),
    start=survey_start,
    end=survey_end,
    type="",                                        # If you can identify, add the type of survey
    benchmarkIDs=[],                                # If you can identify the benchmark IDs, add them here
    notes="Visual identification",                  # Any additional notes about the survey
)

plot_en(temp_df, [survey])
plot_wg_position(df=shot_data_dataframe, start=survey.start, end=survey.end, survey=survey)

### 4.2 Add the survey to our metadata

Once you have narrowed down your survey and any associated metadata, run the next cell to add it to the metadata class. If you wish to create more surveys, repeat the **4.1** step above. If you want to see all surveys so far, run the 2 plot cells below in section 5 to check it out.

In [None]:
site_metadata.run_sub_component(component_type=TopLevelSiteGroups.CAMPAIGNS, component_name=campaign_name,
                            sub_component_type=SubLevelSiteGroups.SURVEYS, sub_component_metadata=survey.__dict__,
                            add_new=True)

***
## 5. Finalize and Export Survey Data
### 5.1 Plot East/North plots with each survey shaded over top.

This cell grabs all of the surveys added to the campaign metadata and plots the east/north positions.

In [None]:
surveys=[]
for campaign in site_metadata.campaigns:
    if campaign.name == campaign_name:
        for survey in campaign.surveys:
            surveys.append(survey)

# Plot the surveys we have so far
plot_en(temp_df, surveys, save_as=f"{plots_dir}/{site}_{campaign_name}_surveys.png")

### 5.2 View all survey maps

Run this cell to view and save all of the survey plots created thus far. 

In [None]:
for individual_survey in surveys:
    print(f"Survey: {individual_survey.id}")
    plot_wg_position(df=shot_data_dataframe, start=individual_survey.start, end=individual_survey.end, survey=individual_survey)

### 5.3 Export updated site metadata to file

Once you have finished identifying and adding survey metadata, save the updated metadata to a file. File name format is `[site].[new_surveys].[date].json`.

In [None]:
SITE_4_CHAR_ID = site           # 4 char site ID

if not SITE_4_CHAR_ID:
    raise ValueError("Please enter a 4 char site ID")

# Export site metadata to a json file
# Add date to the file name
date = datetime.now().strftime("%Y-%m-%d")
file_path = f"./{SITE_4_CHAR_ID + '.new_surveys.' + date}.json"          # Export file path you wish to store
site_metadata.export_site(file_path)