# Demo 1: Track Dev Branch Commits
- Extract single file daily & parse
- Client makes a direct request for rt.sh from GitHub
- rt.sh is read, preprocessed & extracts the timestamps of the relevant datasets which has been pushed on GitHub.
- Generates a file containing the datasets' timestamps
- Program will compare the last log file with the most recent file containing the datasets' timestamps.

### Pre-requisite: Set Conda Environment
Once you have conda installed on your machine, perform the following to create a conda environment:
- Create environment with .yml. Note: Environment name is set within the yml file
    - $ conda env create -f data_maintenance_env.yml

- Activate the new environment via:
   - $ conda activate data_maintenance_env
   
- Verify that the new environment was installed correctly via:
    - $ conda info --env

- Confirm the data_maintenance_env.yml dependencies were installed via:
    - $ conda list

In [None]:
from rt_revision_tracker import *

# Restart the accumulation of timestamps.
rt_revision_tracker().reset_tracker()


In [None]:
from rt_revision_tracker import *

# Accumulation of timestamps since time of reset.
data_log_dict = rt_revision_tracker().populate()
data_log_dict

#### Findings:
- BL dataset timestamps will not necessarily sync up with the date at which PR was approved - could take unknown N days to approve.

- Current practice of GitHub labels signifying a baseline change ("Baseline change" label) does not necessarily ensure timestamp date was actually change

- 'BM_IC' # IC folder's prefix

- 'develop' # Baseline folder's prefix

- 'input-data' # Input folder's prefix

- 'WW3_input_data' # WW3 Input folder's prefix

#### Suggestion:
- Why not name the datasets based on the date at which they were approved? Would allow a script to collect based on PR approved date - assuming baseline github labels are properly labeled.


# Demo 2: Filter Window Featuring Latest N Months (e.g. 2 Months)

In [1]:
# Define Window to Filter dataset by retrieval date duration
# In this scenario, capturing the past 60 days of data.
from rt_tracker_filter import RtTrackerFilter
linked_home_dir = "/home/schin"
orion_rt_data_dir = "/work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/"
hera_rt_data_dir = "/scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/"
maintenance_wrapper =  RtTrackerFilter(linked_home_dir, hera_rt_data_dir)
maintenance_wrapper.maintenance_window(60)

[1m
All Primary Dataset Folders & Files In Main Directory (/home/schin/work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/):
[0m['develop-20220629', 'develop-20220613', 'develop-20220805', 'ufs-public-release-v2-20210212', 'develop-20220623', 'adjust_permissions.sh', 'ufs-public-release-v2-20210208', 'BM_IC-20220207', 'develop-20220316', 'develop-20220720', 'develop-20220722', 'develop-20220713', 'develop-20220707', 'develop-20220601', 'develop-20220425', 'develop-20220719', 'develop-20220616', 'input-data-20220414', 'develop-20220701', 'BM_IC-20210717', 'develop-20220729', 'develop-20220706']
[1m
Data Tracker's Latest Set of Timestamped Datasets Retrieved was on 08-12-2022:[0m
{'BL_DATE': ['20220805'], 'INPUTDATA_ROOT': ['20220414'], 'INPUTDATA_ROOT_WW3': ['20220624'], 'INPUTDATA_ROOT_BMIC': ['20220207']}
[1m
Data Tracker's Retrieval Dates:
[0mdict_keys(['06-30-2022', '07-05-2022', '07-07-2022', '07-13-2022', '07-15-2022', '07-19-2022', '07-20-2022', '07-21-2022', '07-26-2022', '07-27-20


All UFS-WM RT Cloud Data are w/in Requested Window (Latest 60 Days)!
