Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata review from Alison #15

Open
jordansread opened this issue Oct 28, 2020 · 0 comments
Open

metadata review from Alison #15

jordansread opened this issue Oct 28, 2020 · 0 comments

Comments

@jordansread
Copy link

jordansread commented Oct 28, 2020

[ ] Overall

  • I’m confused, based on titles, about how lake metadata is distributed between 1 and 3 updated the titles to make it clear 3_ is for PB configs and not metadata.
  • 2332 lakes = 1882 + 305 + 145. Where is this partitioning specified? I see type in lake_metadata.csv, but it only gives “source” or “target” and doesn’t distinguish between 305 and extended targets added a "ext_target" to the type column, which is what is used for the 1882. There are now 305 with "target", 145 with "source", and 1882 with "ext_target".
  • Why include FIPS/NIST in the Places? not sure...I think this is in meddle or the way the metadata is rendered by SB?
  • Process date of 7/6/2019 seems unlikely, unless we’re counting from nearer the start of the project on purpose updated

[ ] 1 lake info

  • In a lot of the Range Domain Max/Min values are NA not sure I can update all of these in time for the pre-print deadline, but would like to
  • Obs_depth_mean_frac is a fraction but has a maximum of 2.12 this is explained in the metadata.
  • I’d say “even though…” instead of “even while appears in the data release” updated

[ ] 2 temp obs

  • Prmnn_I is quite cryptic added text that explains this is an NHD attribute

[ ] 3 model configs

  • Consider specifying in the title that this child item contains configs for the process-based models only updated
  • Entity type definition for pb0: what is “expert-chosen default”? they weren’t just defaults for pb0, were they? They were the final parameters, right? updated to "lake-specific parameterization"

[ ] 4 model inputs

  • Title of item (“meteorological inputs and ice flags”) does not suggest that metamodel metafeatures will be provided in this item, but they are. changed to just "Model inputs"
  • Range domain Mins/Maxes are NA for several variables would like to update this too, but not much time before pre-print
  • Prmnn_I is still cryptic updated same as other
  • Diff_lathrop_strat: does 1 mean source=TRUE, target=FALSE? Or the other way around? Looking for clear statement of conversion from binary to integer
  • Why wouldn’t mtl_input_metafeatures.csv contain source-target info for all source-target pairs (including those for the extra 1882)? saving this as a question for Jared

[ ] 5 model prediction data

  • Can you title this “model predictions” instead of “model prediction data”? I think calling it data is misleading, even though this is a “data” release and all updated
  • Abstract seems to be for a different project – Minnesota and Wisconsin only, DL models as well as PB and PGDL and PB0. fixed
  • Native Data Set Environment: nice that we’ve at least nodded to the UM supercomputing system in this metadata file, but it mentions DL predictions and doesn’t mention MTL metamodel predictions. Added python env
  • I think here you could have one entity type for all of the model predictions and then template the model. But it’s OK as-is. keeping
  • Pbmtl_predictions_09 is missing known issue, since none of the 305 targets are in group 09.

[ ] 6 model evaluation

  • Could all_MTL_RMSE_predictions.csv also report predictions for the additional 1882 lakes? saving as Jared question too
  • I’d suggest only reporting the top 9 (predicted) sources for each extended target, and populating the predicted and actual RMSEs and predicted rank for PGDL-MTL and PGDL-MTL9 sources, and using NAs for actual/pred_pb_mtl_rmse/rank and actual_pgdl_mtl_rank saving this as a question for later due to limited time
  • The pgmtl*_evaluation.csv files report targets and RMSEs for all 2188, so that’s good. all good
  • It also appears that preditions matched to obs for PGDL-MTL[9] include the extended targets, rigth? yes
  • It also appears that the raw predictions for PGDL-MTL[9] include extended targets? yes
  • Entity Type Definition – could note that the file contains not just MTL (metamodel?) predictions but also actual results for performance of source models applied to targets updated this text
  • Metadata file doesn’t yet describe the xx_matched_to_observations.zip files or xx_evaluation.csv files or sparse_PGDL_vs_PGDL-MTL_rmse.csv file updated now
  • In sparse_PGDL_vs_PGDL-MTL_rmse.csv, it’s weird that RMSE cells for which no model was possible are filled with 0 instead of NA filtered out these rows instead of using NA (or 0). Thanks for pointing that out
jordansread pushed a commit to jordansread/pgmtl-data-release that referenced this issue Nov 8, 2020
jordansread pushed a commit to jordansread/pgmtl-data-release that referenced this issue Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant