# Model Data Preparation - Second Part (Prep Model File)

This second half of Step 4 is to complete the meteorlogy data, join it to the remote sensing data, and prepare the final model input file.

Now that the remote sensing data is prepped, we can proceed to 3 and 4:

1. ~~Write each field's soil and irrigation infomation into a study-wide properties file.~~
2. ~~Process the Landsat data and run a simple analysis of per-field NDVI dynamics that will provide some information on likely subseasonal irrigation application dates, harvest, and fallowing.~~
3. Join our Earth Engine extracts and meteorology data into a per-field time series.
4. Finally, write a single model input file that has all of the data needed.

Step 3 depends on a successful run of Step 2, and Step 4 depends on Step 3, so ensure the code runs to completion on each before moving on.

Note: For this tutorial we're specifying the data directories to which the data is being written as needed. This gets messy but is worth it to learn about the workflow in our first project. In the subsequent tutorial, we will standardize the directory structure for the calibration project and use a configuration file to specify all directories, model metadata, date range for the study period, etc. This will simplify our lives by hiding a lot of what we do in that tutorial under the hood of our SWIM car, so we can focus on calibration.

In [1]:
import os
import sys
import json

import numpy as np
import pandas as pd
import geopandas as gpd

# append the project path to the environment
root = os.path.abspath('../../..')
sys.path.append(root)

## 3. Join the Earth Engine and meteorology time series.

We now specify the inputs we're going to use for our time series, which will have irrigated and unirrigated ETf and NDVI, and all the meteorology data we pulled from GridMET and NLDAS-2. We will need the shapefile we built that has the associated GridMET 'GFID' attribute added:

In [2]:
# Step-specific imports
from prep.field_timeseries import join_daily_timeseries

In [3]:
# we can start to see why we want all these paths in a configuration file, it feels like a waste
# of time writing some of these for the third', 'fourth time
fields_gridmet = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'gis', 'mt_sid_boulder_gfid.shp')
met = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'met_timeseries')
landsat = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'landsat', 'remote_sensing.csv')
snow = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'snodas', 'snodas.json')

joined_timeseries = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'plot_timeseries')
if not os.path.isdir(joined_timeseries):
    os.mkdir(joined_timeseries)


In [12]:
params = ['etf_inv_irr',
          'ndvi_inv_irr',
          'etf_irr',
          'ndvi_irr']
params += ['{}_ct'.format(p) for p in params]

join_daily_timeseries(fields=fields_gridmet,
                      gridmet_dir=met,
                      landsat_table=landsat,
                      snow=snow,
                      dst_dir=joined_timeseries,
                      overwrite=True,
                      start_date='2004-01-01',
                      end_date='2022-12-31', 
                      feature_id='FID_1',
                      **{'params': params})

100%|███████████████████████████████████████████| 78/78 [02:23<00:00,  1.85s/it]

78 fields were successfully processed
0 fields were dropped due to missing data





The ETf Earth Engine collection over the study area is patchy in 2023, so we reduced the coverage from 2004 - 2022. The way the model is setup right now will drop a field entirely if it has a missing year, so it's better to reduce the time coverage, rather than drop a bunch of fields from the analysis.

## 4. Write the model input file.

We now have everything we need and can run the final data preparation function `prep_fields_json`, which will bring together all the data we built and put it in a file format that will run much faster than if we fed the model all those .csv files.



In [4]:
from prep.prep_plots import prep_fields_json

In [5]:
# the properites and cuttings files we prepared before
properties_json = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'tutorial_properties.json')
cuttings_json = os.path.join(root, 'tutorials', '1_Boulder',  'data', 'landsat', 'tutorial_cuttings.json')

# the model input file
prepped_input = os.path.join(root, 'tutorials', '1_Boulder', 'data', 'prepped_input.json')

In [6]:
processed_targets, excluded_targets = prep_fields_json(properties_json, joined_timeseries, prepped_input,
                                                       target_plots=None, irr_data=cuttings_json)

100%|███████████████████████████████████████████| 78/78 [00:02<00:00, 29.00it/s]


wrote /home/dgketchum/PycharmProjects/swim-rs/tutorials/1_Boulder/data/prepped_input.json


Pretty easy! On to running the model!