# Demo: Consolidating Data Maps for Land DA (Data Assimilation)
The mapping tool is based on the current data structure featured within the Land DA datasets required for testing. Due to the fact the Land DA application's forecasting data requirements can vary based on a unique case requested by a user, the mapping tool will map out the data files as they pertain to the current Land DA application's case required for regression testing. The Land DA application's case required for regression testing is specified by the code manager (CM) team. At this time, there are three datasets sourced for testing the Land DA application:

1) Baseline data used within the UFS-WM RT framework (__Source:__ https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html)
2) Input data used within the UFS-WM RT framework (__Source:__ https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html)
3) Data extracted from the Land DA TAR-based object (e.g. landda_inputs.tar.gz_v1.1, land_da_new.tar.gz_v1.2, Landdav1.2.0_input_data.tar.gz) issued by the current CM responsible for the Land DA application (__Source:__ https://noaa-ufs-land-da-pds.s3.amazonaws.com/index.html)

Per _retrieve_data.py_, the current Land DA application's test case will require subsets of the following timestamped datasets:

- _develop-20231108_
- _input-data-20221101_
- _Landdav1.2.0_input_data.tar.gz_

## Requirements:

__To generate data maps of Land DA's sourced datasets, execute the following within the Terminal:__

- _python map_rt_data.py -b land-da -k_input_data input-data-20221101 -k_bl_data develop-20231122_

- _map_land_da_v1p2_data.py -b land-da -k Landdav1.2.0_input_data.tar.gz_

__To consolidate the data maps generated above, execute the following within the Terminal:__

- _python consolidate_maps.py -b land-da -bl_ts 20231122 -input_ts 20221101 -tar_fn Landdav1.2.0_input_data.tar.gz_land-da_data_map.csv -ver 1.2.0_

(OR)

Run the commands below within this demo notebook.


## Libraries/Modules

In [65]:
import pandas as pd
from pandas import ExcelWriter

## Read Data Maps of Datasets Sourced by Specified Version of Land DA application's Test Case

In [None]:
# Read files featuring the data maps of UFS-WM RT baseline datasets required for Land DA's v1.2.0
BL_DATE = '20231122'
ufs_bl_df = pd.read_csv(f'../results/rt_baseline_{BL_DATE}_data_map.csv')


In [None]:
# Read files featuring the data maps of UFS-WM RT input datasets required for Land DA's v1.2.0
INPUTDATA_DATE = '20221101'
ufs_input_df = pd.read_csv(f'../results/rt_input_{INPUTDATA_DATE}_data_map.csv')


In [None]:
# Read files featuring the data maps of the Land DA's TAR-based dataset required for Land DA's v1.2.0

# File to reference for the updated Land DA's v1.2.0 
ver = '1.2.0'
fn = f'Landdav{ver}_input_data.tar.gz_land-da_data_map.csv'

# File to reference for the Land DA's v1.2.0
# ver = '1.2.0'
#fn = f'land_da_new.tar.gz_v{ver}_land-da_data_map.csv'

# File to reference for the Land DA's v1.1.0
# ver = '1.1.0
#fn = f'landda_inputs.tar.gz_v{ver}_land-da_data_map.csv'

# Read referenced files featuring data maps
land_da_input_df = pd.read_csv(f'../results/{fn}')


## Generate Data Map of Input Datasets Required for Current Land DA Application's Test Case

Per Land DA's _retrieve_data.py_, the current Land DA application's test case will require the following input datasets extracted from the UFS-WM RT's S3 bucket: 

- __DATM data__
    - {project_source_dir}/../inputs/NEMSfv3gfs/DATM_GSWP3_input_data/*
      
    - s3://noaa-ufs-regtests-pds/input-data-${INPUTDATA_DATE}/DATM_GSWP3_input_dat*

- __NOAHMP Initial Condition data__
    - {project_source_dir}/../inputs/NEMSfv3gfs/NOAHMP_IC/*
      
    - s3://noaa-ufs-regtests-pds/input-data-${INPUTDATA_DATE}/NOAHMP_IC
 
- __Non-Fixed FV3 data__
    - {project_source_dir}/../inputs/NEMSfv3gfs/FV3_input_data/INPUT
  
    - s3://noaa-ufs-regtests-pds/input-data-${INPUTDATA_DATE}/FV3_input_data/INPUT/C96_grid.tile{1-6}.nc
      
    - s3://noaa-ufs-regtests-pds/input-data-${INPUTDATA_DATE}/FV3_input_data/INPUT/grid_spec.
 
- __Fixed FV3 data__
    - {project_source_dir}/../inputs/NEMSfv3gfs/FV3_fix_tiled/C96/*
      
    - s3://noaa-ufs-regtests-pds/input-data-${INPUTDATA_DATE}/FV3_fix_tiled/C96/*

In [None]:
# Filter to the "DATM" & "NOAHMP Initial Condition" data required from the UFS-WM RT S3
ic_input_df = ufs_input_df[(ufs_input_df['Dataset']==f'input-data-{INPUTDATA_DATE}') & (ufs_input_df['UFS Component'].isin(['DATM_GSWP3_input_data', 'NOAHMP_IC']))]

# Filter to the "Non-Fixed FV3" data required from the UFS-WM RT S3
ufs_input_filtered_df2 = ufs_input_df[(ufs_input_df['Dataset']==f'input-data-{INPUTDATA_DATE}') & (ufs_input_df['UFS Component'].isin(['FV3_input_data'])) & (ufs_input_df['Data File'].isin(['grid_spec.nc'])) & (ufs_input_df['Sub-Category'].isin(['INPUT']))]
ufs_input_filtered_df3 = ufs_input_df[(ufs_input_df['Dataset']==f'input-data-{INPUTDATA_DATE}') & (ufs_input_df['UFS Component'].isin(['FV3_input_data'])) & (ufs_input_df['Data File'].str.startswith('C96_grid.tile')) & (ufs_input_df['Sub-Category'].isin(['INPUT']))]
nonfixed_input_df = pd.concat([ufs_input_filtered_df2, ufs_input_filtered_df3])

# Filter to the "Fixed FV3" data required from the UFS-WM RT S3
fixed_input_df = ufs_input_df[(ufs_input_df['Dataset']==f'input-data-{INPUTDATA_DATE}') & (ufs_input_df['UFS Component'].isin(['FV3_fix_tiled'])) & (ufs_input_df['Resolution (C)']==96)]
fixed_input_df.head(10)

## Generate Data Map of Baseline Datasets Required for Current Land DA application's Test Case

Currently, the baseline datasets required for the Land DA application's test case are:

- __DATM CDEPS LAND GSWP3__
      
    - {project_source_dir}/../inputs/NEMSfv3gfs/develop-{BL_DATE}/intel/datm_cdeps_lnd_gswp3/*
      
    - s3://noaa-ufs-regtests-pds/develop-${BL_DATE}/datm_cdeps_lnd_gswp3_intel/*


In [None]:
# Filter to the "DATM CDEPS LAND GSWP3" data required from the UFS-WM RT S3
bl_filtered_df = ufs_bl_df[(ufs_bl_df['Dataset']==f'develop-{BL_DATE}') & (ufs_bl_df['Compiler'].isin(['intel'])) & (ufs_bl_df['Test Name'].isin(['datm_cdeps_lnd_gswp3']))]
bl_filtered_df

## Generate Data Map of TAR Input Land DA Datasets Required for Current Land DA application's Test Case

Per Land DA's _retrieve_data.py_, the current Land DA application's test case will require the following subset to be extracted from the Land DA TAR-based object, 
_Landdav1.2.0_input_data.tr.gz_ :
  
- For Land DA Application v1.2.0, the Land DA TAR-based object can be found within the Land DA's S3 bucket: __https://noaa-ufs-land-da-pds.s3.amazonaws.com/Landdav1.2.0_input_data.tr.gz__
        

In [None]:
# Land DA TAR-based object
land_da_input_df

## Consolidate All Generated Data Maps Required for Current Land DA application's Test Case
        

In [None]:
# Consolidate all generated data maps required for the specified version of the Land DA application's test case.
list_dfs = [ic_input_df,
            nonfixed_input_df, 
            fixed_input_df, 
            bl_filtered_df, 
            land_da_input_df]
names = ["DATM_NOAHMP_IC", 
         "NonFixed_FV3", 
         "Fixed_FV3", 
         "Baseline", 
         "LANDDA_TAR"]
with ExcelWriter(f'../results/land_da_test_case_{ver}_data_maps.xlsx') as writer:
    for i, df in enumerate(list_dfs):
        df.to_excel(writer,sheet_name = names[i], index=False)
        