metadata review from Alison #15

jordansread · 2020-10-28T15:23:32Z

~~I’m confused, based on titles, about how lake metadata is distributed between 1 and 3~~ updated the titles to make it clear 3_ is for PB configs and not metadata.
2332 lakes = 1882 + 305 + 145. Where is this partitioning specified? I see type in lake_metadata.csv, but it only gives “source” or “target” and doesn’t distinguish between 305 and extended targets added a "ext_target" to the type column, which is what is used for the 1882. There are now 305 with "target", 145 with "source", and 1882 with "ext_target".
Why include FIPS/NIST in the Places? not sure...I think this is in meddle or the way the metadata is rendered by SB?
~~Process date of 7/6/2019 seems unlikely, unless we’re counting from nearer the start of the project on purpose~~ updated

In a lot of the Range Domain Max/Min values are NA not sure I can update all of these in time for the pre-print deadline, but would like to
~~Obs_depth_mean_frac is a fraction but has a maximum of 2.12~~ this is explained in the metadata.
~~I’d say “even though…” instead of “even while appears in the data release”~~ updated

~~Consider specifying in the title that this child item contains configs for the process-based models only~~ updated
~~Entity type definition for pb0: what is “expert-chosen default”? they weren’t just defaults for pb0, were they? They were the final parameters, right?~~ updated to "lake-specific parameterization"

~~Title of item (“meteorological inputs and ice flags”) does not suggest that metamodel metafeatures will be provided in this item, but they are.~~ changed to just "Model inputs"
Range domain Mins/Maxes are NA for several variables would like to update this too, but not much time before pre-print
~~Prmnn_I is still cryptic~~ updated same as other
Diff_lathrop_strat: does 1 mean source=TRUE, target=FALSE? Or the other way around? Looking for clear statement of conversion from binary to integer
Why wouldn’t mtl_input_metafeatures.csv contain source-target info for all source-target pairs (including those for the extra 1882)? saving this as a question for Jared

~~Can you title this “model predictions” instead of “model prediction data”? I think calling it data is misleading, even though this is a “data” release and all~~ updated
~~Abstract seems to be for a different project – Minnesota and Wisconsin only, DL models as well as PB and PGDL and PB0.~~ fixed
~~Native Data Set Environment: nice that we’ve at least nodded to the UM supercomputing system in this metadata file, but it mentions DL predictions and doesn’t mention MTL metamodel predictions.~~ Added python env
~~I think here you could have one entity type for all of the model predictions and then template the model. But it’s OK as-is.~~ keeping
~~Pbmtl_predictions_09 is missing~~ known issue, since none of the 305 targets are in group 09.

Could all_MTL_RMSE_predictions.csv also report predictions for the additional 1882 lakes? saving as Jared question too
I’d suggest only reporting the top 9 (predicted) sources for each extended target, and populating the predicted and actual RMSEs and predicted rank for PGDL-MTL and PGDL-MTL9 sources, and using NAs for actual/pred_pb_mtl_rmse/rank and actual_pgdl_mtl_rank saving this as a question for later due to limited time
~~The pgmtl*_evaluation.csv files report targets and RMSEs for all 2188, so that’s good.~~ all good
~~It also appears that preditions matched to obs for PGDL-MTL[9] include the extended targets, rigth?~~ yes
~~It also appears that the raw predictions for PGDL-MTL[9] include extended targets?~~ yes
~~Entity Type Definition – could note that the file contains not just MTL (metamodel?) predictions but also actual results for performance of source models applied to targets~~ updated this text
~~Metadata file doesn’t yet describe the xx_matched_to_observations.zip files or xx_evaluation.csv files or sparse_PGDL_vs_PGDL-MTL_rmse.csv file~~ updated now
~~In sparse_PGDL_vs_PGDL-MTL_rmse.csv, it’s weird that RMSE cells for which no model was possible are filled with 0 instead of NA~~ filtered out these rows instead of using NA (or 0). Thanks for pointing that out

The text was updated successfully, but these errors were encountered:

jordansread pushed a commit to jordansread/pgmtl-data-release that referenced this issue Nov 8, 2020

updates from review DOI-USGS#15

664edd3

jordansread pushed a commit to jordansread/pgmtl-data-release that referenced this issue Nov 8, 2020

using ext_target per DOI-USGS#15

8fd18b9

Provide feedback