Skip to content

Selecting subset of plots

Matthew Perry edited this page Jul 3, 2013 · 1 revision

The first batch run attempted to do ALL of the conditions found in the IDB database in oregon and washington. This is too much, especially when we consider the future addition of site index, climate scenarios and other multipliers.

We approached the problem with a different angle: using the nearest neighbor matching to find the most representative conditions

The process goes a bit like this:

  1. Take each plot and find the closest 3 nearest neighbors
  2. Don't count it if the certainty is < 75% or it matches only itself
  3. Look at the top 75% (??) of the matches which should give us 10% (??) of the original plots. IOW that 10% can stand in for ~ 75% of the entire dataset
  4. Throw out any conditions that don't have a stand_age defined in the stdinfo file
  5. After the runs are completed, examine the error logs and throw out any "blacklisted" runs that have problems.